hybkit
- Project components:
hybkit toolkit of command-line utilities for manipulating, analyzing, and plotting hyb-format data.
The hybkit python API, an extendable documented codebase for creation of custom analyses of hyb-format data.
Integrated analysis of predicted secondary structure (fold) information for the API and command-line utilities.
Example analyses for publicly available qCLASH hybrid sequence data implemented in each of the command-line scripts and hybkit Python API.
- Hybkit Toolkit:
The hybkit toolkit includes several command-line utilities for manipulation of hyb-format data:
Utility
Description
hyb_check
Parse hyb (and fold) files and check for errors
hyb_eval
Evaluate hyb (and fold) records to identify / assign segment types and miRNAs using custom criteria
hyb_filter
Filter hyb (and fold) records to a specific custom subset
hyb_analyze
Perform an energy, type, miRNA, target, or fold analysis on hyb (and fold) files and plot results
These scripts are used on the command line with hyb (and associated "vienna" or "CT") files. For example, to filter a hyb and corresponding vienna file to contain only hybrids with a sequence identifier containing the string "kshv":
Example:
$ hyb_filter -i my_hyb_file.hyb -f my_hyb_file.vienna --filter any_seg_contains kshv
Further detail on the usage of each script is provided in the hybkit Toolkit section of this documentation.
- Hybkit API:
Hybkit provides a Python3 module with a documented API for interacting with records in hyb files and associated vienna or CT files. This capability was inspired by the BioPython Project. The primary utility is provided by a class for hyb records (HybRecord), a class for fold records (FoldRecord), and file-iterator classes (HybFile, ViennaFile, CTFile, HybFoldIter). Record attributes can be analyzed, set, and evaluated using included class methods.
For example, a workflow to print the identifiers of only sequences within a ".hyb" file that contain a miRNA can be performed as such:
#!/usr/bin/env python3 import hybkit in_file = '/path/to/my_hyb_file.hyb' # Open a hyb file as a HybFile Object: with hybkit.HybFile.open(in_file, 'r') as hyb_file: # Return each line in a hyb file as a HybRecord object for hyb_record in hyb_file: # Analyze each record to assign segment types hyb_record.eval_types() # If the record contains a long noncoding RNA type, print the record identifier. if hyb_record.has_prop('any_seg_type_contains', 'lncRNA') print(hyb_record.id)
Further documentation on the hybkit API can be found in the hybkit API section of this documentation.
- Example Analyses:
Hybkit provides several example analyses for hyb data using the utilities provided in the toolkit. These include:
Analysis
Description
Type/miRNA Analysis
Quantify sequence types and miRNA types in a hyb file
Target Analysis
Analyze targets of a set of miRNAs from a single experimental replicate
Grouped Target Analysis
Analyze and plot targets of a set of miRNAs from pooled experimental replicates
Fold Analysis
Analyze and plot predicted miRNA folding patterns in miRNA-containing hybrids
These analyses provide analysis results in both tabular and graph form. As an illustration, the example summary analysis includes the return of the contained hybrid sequence types as both a csv table and as a pie chart:
Further detail on each provided analysis can be found in the Example Analyses section of this documentation.
- Installation:
- Dependencies:
Python3.8+
matplotlib >= 3.7.1 (Hunter JD. (Computing in Science & Engineering 2007))
BioPython >= 1.79 (Cock et al. (Bioinformatics 2009))
typing_extensions <https://pypi.org/project/typing-extensions/> >= 4.8.0
- Via PyPI / Python PIP:
-
The recommended installation method is via hybkit's PyPI Package Index using python3 pip, which will automatically handle version control and dependency installation:
$ python3 -m pip install hybkit
- Via Conda:
-
For users of conda, the hybkit package and dependencies are hosted on the the Bioconda channel, and can be installed using conda:
$ conda install -c bioconda hybkit
- Via Docker/Singularity:
-
The hybkit package is also available as a Docker image and Singularity container, hosted via the BioContainers project on quay.io. To pull the image via docker:
$ docker pull quay.io/biocontainers/hybkit:0.3.3--pyhdfd78af_0
To pull the image via singularity:
$ singularity pull docker://quay.io/biocontainers/hybkit:0.3.3--pyhdfd78af_0
- Manually Download and Install:
-
Use git to clone the project's Github repository:
$ git clone git://github.com/RenneLab/hybkit
OR download the zipped package:
$ curl -OL https://github.com/RenneLab/hybkit/archive/master.zip $ unzip master.zip
Then install using python setuptools:
$ python setup.py install
Further documentation on hybkit usage can be found in this documentation.
- Setup Testing:
Hybkit provides a suite of unit tests to maintain stability of the API and script functionalities. To run the API test suite, install pytest and run the tests from the root directory of the hybkit package:
$ pip install pytest $ pytest
Command-line scripts can be tested by running the auto_test.sh script in the auto_tests directory:
$ ./auto_tests/auto_test.sh
- Copyright:
- hybkit is a free, sharable, open-source project.All source code and executable scripts contained within this package are considered part of the "hybkit" project and are distributed without any warranty or implied warranty under the GNU General Public License v3.0 or any later version, described in the "LICENSE" file.
hybkit Hyb File Specification
Version: v0.3.4
The ".hyb" (hyb file) format is described by [Travis2014] along with the Hyb software package as a "gff-related format that contains sequence identifiers, read sequences, 1-based mapping coordinates, and annotation information for each chimera".
Each line in a hyb file (a hyb "record") contains information about an RNA sequence read identified as a chimera by an RNA hybridization analysis. Each line contains 15 or 16 columns separated by tab characters ("\t") and provides information on each of the two aligned segments identified within the sequence read. The columns are described as follows by [Travis2014]:
Column 1, unique sequence identifier.Column 2, read sequence [...].Column 3, predicted binding energy in kcal/mol.Columns 4–9, mapping information for first fragment of read: name of matched transcript, coordinates in read, coordinates in transcript, mapping score.Columns 10–15, mapping information for second fragment of read.Column 16 (optional, [...]), list of annotations in the format: ‘‘feature1=value1; feature2=value2;..."The hybkit project uses an extended version of this description, including assigning columns reference names, and defining allowed flags.
Columns
#
Name
Description
1
id
Hybrid Read Identifier
2
seq
Read Nucleotide Sequence
3
energy
Predicted Gibbs Free-Energy of Intra-Hybrid Folding
4
seg1_ref_name
Segment 1 Mapping Reference Identity
5
seg1_read_start
Segment 1 Mapping Start on Read
6
seg1_read_end
Segment 1 Mapping End on Read
7
seg1_ref_start
Segment 1 Mapping Start on Reference
8
seg1_ref_end
Segment 1 Mapping End on Reference
9
seg1_score
Segment 1 Mapping Score
10
seg2_ref_name
Segment 2 Mapping Reference Identity
11
seg2_read_start
Segment 2 Mapping Start on Read
12
seg2_read_end
Segment 2 Mapping End on Read
13
seg2_ref_start
Segment 2 Mapping Start on Reference
14
seg2_ref_end
Segment 2 Mapping End on Reference
15
seg2_score
Segment 2 Mapping Score
16
flags
Hybrid Read Analysis Details
Flags
- Hyb Flags:
The following four flags are used by the Hyb software package ([Travis2014]). The definitions provided describe how these flags are used in the hybkit package.
count_total
- Integer: Total represented hybrid records, if records have been combined.
count_last_clustering
- Integer: Total represented hybrid records at last clustering.
two_way_merged
- {"0" or "1"} Boolean representation of whether entries with mirrored 5' and 3' hybrids were merged if the record is a combined record.
seq_IDs_in_cluster
- String: Comma-separated list of all record IDs of hybrids merged into this hybrid entry.- hybkit Flags:
The following flags are used by hybkit.
read_count
- Integer: Number of sequence reads represented by this record. If the record is combined, this represents the total read count for all merged entries.
orient
- String: Orientation of strand. Options: "F" (Forward), "IF" (Inferred Forward), "R" (Reverse), "IR" (Inferred Reverse), "U" (Unknown), or "IC" (Inferred Conflicting).
seg1_type
- String: Assigned segment type of segment 1, ex: "miRNA" or "mRNA".
seg2_type
- String: Assigned segment type of segment 2, ex: "miRNA" or "mRNA".
seg1_det
- String: Arbitrary detail about segment 1.
seg2_det
- String: Arbitrary detail about segment 2.
miRNA_seg
- String: Indicates which (if any) segment mapping is a miRNA. Options are "N" (none), "5p" (seg1), "3p" (seg2), "B" (both), or "U" (unknown).
target_reg
- String: Assigned region of the miRNA target. Options are "5pUTR", "C" ([C]oding), "3pUTR", "NON" ([NON]coding), "N" ([N]one), or "U" ([U]nknown).
ext
- Integer: "0" or "1", Boolean representation of whether record sequences were bioinformatically extended as is performed by the Hyb software package.
dataset
- String: Label for sequence dataset id (eg. source file), when combining records from different datasets.
Other Details
Item
Role
"\t" (tab)
Column Delimiter
"."
Missing Data Placeholder (equivalent to None)
".hyb"
File Suffix
".hyb.gz"
gzipped File Suffix
Disallowed
Header Lines
Disallowed
In-file Comments
Example
An example .hyb format line (courtesy of [Gay2018]):
2407_718 ATCACATTGCCAGGGATTTCCAATCCCCAACAATGTGAAAACGGCTGTC . MIMAT0000078_MirBase_miR-23a_microRNA 1 21 1 21 0.0027 ENSG00000188229_ENST00000340384_TUBB2C_mRNA 23 49 1181 1207 1.2e-06
hybkit API
The hybkit API provides a Python3 module with classes allowing parsing and manipulation of hyb-format data as python objects, including built-in analysis and plotting functionality for common tasks in hybrid sequence analysis.
Data classes for storing, evaluating, and iterating over records
Constants and settings information for hybkit classes and toolkit scripts
Class for customizable identification of segment type from reference identifiers
Classes for predefined analyses of hyb records
Plotting methods for analysis results
Support methods for executable scripts
Error classes for the hybkit package
hybkit (module)
Module storing primary hybkit classes and hybkit API.
This module contains classes and methods for reading, writing, and manipulating data in the hyb genomic sequence format ([Travis2014]). For more information, see the hybkit Hyb File Specification.
An example string of a hyb-format line from [Gay2018] is:
2407_718\tATCACATTGCCAGGGATTTCCAATCCCCAACAATGTGAAAACGGCTGTC\t.\tMIMAT0000078_MirBase_miR-23a_microRNA\t1\t21\t1\t21\t0.0027\tENSG00000188229_ENST00000340384_TUBB2C_mRNA\t23\t49\t1181\t1207\t1.2e-06
Hybkit functionality is primarily based on classes for storage and evaluation of chimeric genomic sequences and associated fold-information:
Class to store a single hyb (hybrid) sequence record |
|
Class to store predicted RNA secondary structure information for hybrid reads |
Also included are classes for reading, writing, and iterating over files containing hybrid information:
Class for reading and writing hyb-format files
[Travis2014] containing chimeric RNA sequence information
as |
|
Class for reading and writing Vienna-format files
[ViennaFormat] containing RNA secondary structure information
in dot-bracket format as |
|
-BETA- Class for reading Connectivity Table (.ct)-format files
[CTFormat] containing predicted RNA secondary-structure
information as used by UNAFold as
|
|
Class for concurrent iteration over a |
HybRecord Class
- class hybkit.HybRecord(id: str, seq: str, energy: Optional[Union[float, int, str]] = None, seg1_props: Optional[Dict[str, Union[float, int, str]]] = None, seg2_props: Optional[Dict[str, Union[float, int, str]]] = None, flags: Optional[Dict[str, Any]] = None, read_count: Optional[int] = None, allow_undefined_flags: Optional[bool] = None)
Class for storing and analyzing chimeric (hybrid) RNA-seq reads in hyb format.
Hyb file (hyb) format entries are a GFF-related file format described by [Travis2014] that contain information about a genomic sequence read identified to be a hybrid by a chimeric read caller. Each line contains 15 or 16 columns separated by tabs ("\t") and provides annotations on each component. An example hyb-format line from [Gay2018]:
2407_718\tATCACATTGCCAGGGATTTCCAATCCCCAACAATGTGAAAACGGCTGTC\t.\tMIMAT0000078_MirBase_miR-23a_microRNA\t1\t21\t1\t21\t0.0027\tENSG00000188229_ENST00000340384_TUBB2C_mRNA\t23\t49\t1181\t1207\t1.2e-06
The columns are respectively described in hybkit as:
id
,seq
,energy
,seg1_ref_name
,seg1_read_start
,seg1_read_end
,seg1_ref_start
,seg1_ref_end
,seg1_score
,seg2_ref_name
,seg2_read_start
,seg2_read_end
,seg2_ref_start
,seg2_ref_end
,seg2_score
,flags
(For more information, see the hybkit Hyb File Specification)
The preferred method for reading hyb records from lines is with the
HybRecord.from_line()
constructor:# line = "2407_718\tATC..." hyb_record = hybkit.HybRecord.from_line(line)
This is the constructor used by the
HybFile
class to parse hyb files. For example, to print all hybrid identifiers in a hyb file:with hybkit.HybFile('path/to/file.hyb', 'r') as hyb_file: # performs "hyb_record = hybkit.HybRecord.from_line(line)" for each line in file for hyb_record in hyb_file: print(hyb_record.id)
HybRecord objects can also be constructed directly. A minimum amount of data necessary for a HybRecord object is the genomic sequence and its corresponding identifier.
Examples
hyb_record_1 = hybkit.HybRecord('1_100', 'ACTG') hyb_record_2 = hybkit.HybRecord('2_107', 'CTAG', '-7.3') hyb_record_3 = hybkit.HybRecord('3_295', 'CTTG', energy='-10.3')
Details about segments are provided via python dictionaries with
keys
specific to each segment. Data can be provided either as strings or as floats/integers (where appropriate). For example, to create a HybRecord object representing the example line given above:seg1_props = {'ref_name': 'MIMAT0000078_MirBase_miR-23a_microRNA', 'read_start': '1', 'read_end': '21', 'ref_start': '1', 'ref_end': '21', 'score': '0.0027'} seg2_props = {'ref_name': 'ENSG00000188229_ENST00000340384_TUBB2C_mRNA', 'read_start': 23, 'read_end': 49, 'ref_start': 1181, 'ref_end': 1207, 'score': 1.2e-06} seq_id = '2407_718' seq = 'ATCACATTGCCAGGGATTTCCAATCCCCAACAATGTGAAAACGGCTGTC' energy = None hyb_record = hybkit.HybRecord(seq_id, seq, energy, seg1_props, seg2_props) # OR hyb_record = hybkit.HybRecord(seq_id, seq, seg1_props=seg1_props, seg2_props=seg2_props)
- Parameters
id (str) -- Identifier for the hyb record
seq (str) -- Nucleotide sequence of the hyb record
energy (
str
orfloat
, optional) -- Predicted energy of sequence folding in kcal/molseg1_props (
dict
, optional) -- Properties of segment 1 of the record, containing possiblesegment column
keys: (ref_name
,read_start
,read_end
,ref_start
,ref_end
,score
)seg2_props (
dict
, optional) -- Properties of segment 2 of the record, containing possible:segment column
keys: (ref_name
,read_start
,read_end
,ref_start
,ref_end
,score
)flags (
dict
, optional) -- Dict with keys of flags for the record and their associated values. By default flags must be defined inALL_FLAGS
but custom flags can be supplied by changingHybRecord.settings['custom_flags']
. This setting can also be disabled by setting 'allow_undefined_flags' toTrue
inHybRecord.settings
.allow_undefined_flags (
bool
, optional) -- IfTrue
, allows flags not defined inALL_FLAGS
orHybRecord.settings['custom_flags']
to be added to the record. If not provided, defaults to the value inHybRecord.settings['allow_undefined_flags']
.
- Variables
id (str) -- Identifier for the hyb record (Hyb format:
<read-num>_<read-count>
)seq (str) -- Nucleotide sequence of the hyb record
energy (str) -- Predicted energy of folding
seg1_props (dict) -- Information on chimeric segment 1, contains
segment column
keys:ref_name
(str
),read_start
(int
),read_end
(int
),ref_start
(int
),ref_end
(int
), andscore
(float
).seg2_props (dict) -- Information on segment 2, contains
segment column
keys:ref_name
(str
),read_start
(int
),read_end
(int
),ref_start
(int
),ref_end
(int
), andscore
(float
).flags (dict) -- Dict of flags with possible
flag keys
and values as defined in the Flags section of the hybkit Hyb File Specification.fold_record (FoldRecord) -- Information on the predicted secondary structure of the sequence set by
set_fold_record()
.allow_undefined_flags (bool) -- Whether to allow undefined flags to be set.
- HYBRID_COLUMNS = ('id', 'seq', 'energy')
Record columns 1-3 defining parameters of the overall hybrid, defined by the Hyb format
- SEGMENT_COLUMNS = ('ref_name', 'read_start', 'read_end', 'ref_start', 'ref_end', 'score')
Record columns 4-9 and 10-15, respectively, defining annotated parameters of seg1 and seg2 respectively, defined by the Hyb format
- ALL_FLAGS = ('count_total', 'count_last_clustering', 'two_way_merged', 'seq_IDs_in_cluster', 'read_count', 'orient', 'det', 'seg1_type', 'seg2_type', 'seg1_det', 'seg2_det', 'miRNA_seg', 'target_reg', 'ext', 'dataset')
Flags defined by the hybkit package. Flags 1-4 are utilized by the Hyb software package. For information on flags, see the Flags portion of the hybkit Hyb File Specification.
- settings = {'allow_undefined_flags': False, 'allow_unknown_seg_types': False, 'custom_flags': [], 'hyb_placeholder': '.', 'mirna_types': ['miRNA', 'microRNA'], 'reorder_flags': True}
Class-level settings. See
settings.HybRecord_settings_info
for descriptions.
- TypeFinder
Link to
type_finder.TypeFinder
class for parsing sequence identifiers in assigning segment types byeval_types()
.
- SET_PROPS = ('energy', 'full_seg_props', 'fold_record', 'eval_types', 'eval_mirna', 'eval_target')
Properties for the
is_set()
method.energy
:energy
is not Nonefull_seg_props
: Each seg key is in segN_props dict and is not Nonefold_record
: fold_record has been seteval_mirna
: miRNA_seg flag has been set
- GEN_PROPS = ('has_indels',)
General record properties for the
prop()
method.has_indels
: either seg1 or seg2 alignments has insertions/deletions, shown by differing read/reference length for the same alignment
- STR_PROPS = ('id_is', 'id_prefix', 'id_suffix', 'id_contains', 'seq_is', 'seq_prefix', 'seq_suffix', 'seq_contains', 'seg1_is', 'seg1_prefix', 'seg1_suffix', 'seg1_contains', 'seg2_is', 'seg2_prefix', 'seg2_suffix', 'seg2_contains', 'any_seg_is', 'any_seg_prefix', 'any_seg_suffix', 'any_seg_contains', 'seg1_type_is', 'seg1_type_prefix', 'seg1_type_suffix', 'seg1_type_contains', 'seg2_type_is', 'seg2_type_prefix', 'seg2_type_suffix', 'seg2_type_contains', 'any_seg_type_is', 'any_seg_type_prefix', 'any_seg_type_suffix', 'any_seg_type_contains')
String-comparison properties for the
prop()
method.Field Types:
id
: record.idseq
: record.seqseg1
: seg1_props['ref_name']seg2
: seg2_props['ref_name']any_seg
: seg1_props['ref_name'] OR seg1_props['ref_name']seg1_type
: seg1_type flagseg2_type
: seg2_type flag
Comparisons:
is
: Comparison string matches field exactlyprefix
: Comparison string matches beginning of fieldsuffix
: Comparison string matches end of fieldcontains
: Comparison string is contained within field
- MIRNA_PROPS = ('has_mirna', 'no_mirna', 'mirna_dimer', 'mirna_not_dimer', '5p_mirna', '3p_mirna')
miRNA-evaluation-related properties for the
prop()
method. Requires miRNA_seg flag to be set byeval_mirna()
method.has_mirna
: Either or Both Seg1 or seg2 hve been identified as a miRNAno_mirna
: Both Seg1 and seg2 have been identified as Not a miRNAmirna_dimer
: Both seg1 and seg2 have been identified as a miRNAmirna_not_dimer
: One and Only One of seg1 or seg2 has been identified as a miRNA5p_mirna
: Seg1 (5p) has been identified as a miRNA3p_mirna
: Seg2 (3p) has been identified as a miRNA
- MIRNA_STR_PROPS = ('mirna_is', 'mirna_prefix', 'mirna_suffix', 'mirna_contains', 'target_is', 'target_prefix', 'target_suffix', 'target_contains', 'mirna_seg_type_is', 'mirna_seg_type_prefix', 'mirna_seg_type_suffix', 'mirna_seg_type_contains', 'target_seg_type_is', 'target_seg_type_prefix', 'target_seg_type_suffix', 'target_seg_type_contains')
Comparisons:
is
: Comparison string matches field exactlyprefix
: Comparison string matches beginning of fieldsuffix
: Comparison string matches end of fieldcontains
: Comparison string is contained within field
- HAS_PROPS = ('has_indels', 'id_is', 'id_prefix', 'id_suffix', 'id_contains', 'seq_is', 'seq_prefix', 'seq_suffix', 'seq_contains', 'seg1_is', 'seg1_prefix', 'seg1_suffix', 'seg1_contains', 'seg2_is', 'seg2_prefix', 'seg2_suffix', 'seg2_contains', 'any_seg_is', 'any_seg_prefix', 'any_seg_suffix', 'any_seg_contains', 'seg1_type_is', 'seg1_type_prefix', 'seg1_type_suffix', 'seg1_type_contains', 'seg2_type_is', 'seg2_type_prefix', 'seg2_type_suffix', 'seg2_type_contains', 'any_seg_type_is', 'any_seg_type_prefix', 'any_seg_type_suffix', 'any_seg_type_contains', 'has_mirna', 'no_mirna', 'mirna_dimer', 'mirna_not_dimer', '5p_mirna', '3p_mirna', 'mirna_is', 'mirna_prefix', 'mirna_suffix', 'mirna_contains', 'target_is', 'target_prefix', 'target_suffix', 'target_contains', 'mirna_seg_type_is', 'mirna_seg_type_prefix', 'mirna_seg_type_suffix', 'mirna_seg_type_contains', 'target_seg_type_is', 'target_seg_type_prefix', 'target_seg_type_suffix', 'target_seg_type_contains')
All allowed properties for the
prop()
method. SeeGEN_PROPS
,STR_PROPS
,MIRNA_PROPS
, andMIRNA_STR_PROPS
- set_flag(flag_key: str, flag_val: Optional[Union[float, int, str, bool]], allow_undefined_flags: Optional[bool] = None) None
Set the value of record
flag_key
toflag_val
.- Parameters
flag_key (str) -- Key for flag to set.
flag_val -- Value for flag to set.
allow_undefined_flags (
bool
, optional) -- Allow inclusion of flags not defined inALL_FLAGS
or insettings['custom_flags']
. If not provided, uses setting in'HybRecord.allow_undefined_flags'
(Defaults to value in:settings['allow_undefined_flags']
).
- get_seg1_type(require: bool = False) Optional[str]
Return the seg1_type flag if defined, or return None.
- Parameters
require -- If
True
, raise an error if seg1_type is not defined.
- get_seg2_type(require: bool = False) Optional[str]
Return the seg2_type flag if defined, or return None.
- Parameters
require (
bool
, optional) -- IfTrue
, raise an error if seg2_type is not defined.
- get_seg_types(require: bool = False) Tuple[Optional[str], Optional[str]]
Return "seg1_type" (or None), "seg2_type" (or None) flags.
Return a tuple of the seg1_type and seg2_type flags for each respective flag that is defined, or None for each flag that is not.
- Parameters
require (
bool
, optional) -- IfTrue
, raise an error if either flag is not defined.
- get_read_count(require: bool = False) Optional[int]
Return the read_count flag if defined, otherwise return None.
- Parameters
require (
bool
, optional) -- IfTrue
, raise an error if the "read_count" flag is not defined.
- get_record_count(require: bool = False) int
Return count_total flag if defined, or return 1 (this record).
- Parameters
require (
bool
, optional) -- IfTrue
, raise an error if the "count_total" flag is not defined.
- get_mirna_props(allow_mirna_dimers: bool = False, require: bool = True) Optional[Dict]
Return the seg_props dict corresponding to the miRNA segment, if set.
If
eval_mirna()
has been run, return the seg_props dict corresponding to the miRNA segment type as determined by checking the miRNA_seg flag, orNone
if the record does not contain a miRNA.
- get_target_props(allow_mirna_dimers: bool = False, require: bool = True) Optional[Dict]
Return the seg_props dict corresponding to the target segment, if set.
If
eval_mirna()
has been run, return the seg_props dict corresponding to the target segment type as determined by checking the miRNA_seg flag, (and returning the other segment), orNone
if the record does not contain a miRNA or contains two miRNAs.- Parameters
allow_mirna_dimers (
bool
, optional) -- IfTrue
, consider miRNA dimers as a miRNA/target pair and return the 3p miRNA segment properties as the arbitrarily-selected "target" of the dimer pair.require (
bool
, optional) -- IfTrue
, raise an error if the read does not contain a single target-annotated segment (Default:True
).
- eval_types(allow_unknown: Optional[bool] = None) None
Find the types of each segment using the the
TypeFinder
class.This method provides
HybRecord.seg1_props
andHybRecord.seg2_props
to theTypeFinder
class, linked as attributeHybRecord.TypeFinder
. This uses the method:TypeFinder.find
set byTypeFinder.set_method
orTypeFinder.set_custom_method
to set the seg1_type, seg2_type flags if not already set.To use a type-finding method other than the default, prepare the
TypeFinder
class by preparing and settingTypeFinder.params
and usingTypeFinder.set_method
.- Parameters
allow_unknown (
bool
, optional) -- IfTrue
, allow segment types that cannot be identified and set them as "unknown". Otherwise raise an error. If not provided uses setting insettings['allow_unknown_seg_types']
.
- set_fold_record(fold_record: Union[FoldRecord, Tuple[FoldRecord, Any]], allow_energy_mismatch: bool = False) None
Check and set provided fold_record (
FoldRecord
) as attribute fold_record.Ensures that fold_record argument is an instance of FoldRecord and has a matching sequence to this HybRecord, then set as HybRecord.fold_record.
- Parameters
fold_record (FoldRecord) --
FoldRecord
instance to set as HybRecord.fold_record.allow_energy_mismatch (
bool
, optional) -- IfTrue
, allow mismatched fold_record and HybRecord energy. Otherwise, raise an error.
- eval_mirna(override: bool = False, mirna_types: Optional[bool] = None) None
Analyze and set miRNA properties from type properties in the hyb record.
If not already done, determine whether a miRNA exists within this record and set the miRNA_seg flag. This evaluation requires the seg1_type and seg2_type flags to be populated, which can be performed by the
eval_types()
method.- Parameters
override (
bool
, optional) -- IfTrue
, override existing miRNA_seg flag if present.mirna_types (
list
,tuple
, orset
, optional) -- Iterable of string representing sequence types considered as miRNA. Otherwise, the types are used fromsettings['mirna_types']
(it is suggested that this be provided as aset
for fastest checking).
- mirna_details(detail: Literal['all', 'mirna_ref', 'target_ref', 'mirna_seg_type', 'target_seg_type', 'mirna_seq', 'target_seq', 'mirna_fold', 'target_fold'] = 'all', allow_mirna_dimers: bool = False) Optional[Union[Dict, str]]
Provide a detail about the miRNA or target following
eval_mirna()
.Analyze miRNA properties within the sequence record and provide a detail as output. Unless
allow_mirna_dimers
isTrue
, this method requires record to contain a non-dimer miRNA, otherwise an error will be raised.- Parameters
detail (str) --
Type of detail to return. Options include:all
: Dict of all properties (default)mirna_ref
: Identifier for Assigned miRNAtarget_ref
: Identifier for Assigned Targetmirna_seg_type
: Assigned seg_type of miRNAtarget_seg_type
: Assigned seg_type of targetmirna_seq
: Annotated subsequence of miRNAtarget_seq
: Annotated subsequence of targetmirna_fold
: Annotated fold substring of miRNA (requires fold_record set)target_fold
: Annotated fold substring of target (requires fold_record set)allow_mirna_dimers (
bool
, optional) -- Allow miRNA/miRNA dimers. The 5p-position will be assigned as the "miRNA", and the 3p-position will be assigned as the "target".
- mirna_detail(*args, **kwargs)
Deprecate, alias for
mirna_details()
.Deprecated since version v0.3.0.
- is_set(prop: str) bool
Return
True
if HybRecord property "prop" is set (if relevant) and is notNone
.Options described in
SET_PROPS
.- Parameters
prop (str) -- Property / Analysis to check
- not_set(prop: str) bool
Return
False
if HybRecord property "prop" is set (if relevant) and is notNone
.( returns
not is_set(prop)
)- Parameters
prop (str) -- Property / Analysis to check
- prop(prop: str, prop_compare: Optional[str] = None) bool
Return
True
if HybRecord has property:prop
.Check property against list of allowed properties in
HAS_PROPS
. If query property has a string comparator, provide this in prop_compare. Raises an error if a prerequisite field is not set (useis_set()
to check whether properties are set).Specific properties available to check are described in attributes:
General Record Properties
Field String Comparison Properties
miRNA-Associated Record Properties
miRNA-Associated String Comparison Properties
- has_prop(*args, **kwargs)
Return
True
if HybRecord has property:prop
.Deprecated since version v0.3.0: Use
prop()
instead.
- to_line(newline: bool = True, sep: str = '\t') str
Return a hyb-format string representation of the record.
- to_csv(newline: bool = False) str
Return a comma-separated hyb-format string representation of the record.
- Parameters
newline (
bool
, optional) -- IfTrue
, end the returned string with a newline.
- to_fields(missing_obj: Optional[Union[float, int, str, bool]] = None) dict
Return a python dictionary representation of the record.
Returns a dictionary with keys corresponding to the fields in the hyb-format file, and values corresponding to the values in the record. Output can be used with the pandas DataFrame constructor or csv.DictWriter.
- Parameters
missing_obj (optional) -- Object to use for missing values. Default =
None
.
- to_fasta_record(mode: Literal['hybrid', 'seg1', 'seg2', 'mirna', 'target'] = 'hybrid', annotate: bool = True, allow_mirna_dimers: bool = False) None
Return nucleotide sequence as BioPython SeqRecord object.
- Parameters
mode (
str
, optional) --Determines which sequence component to return. Options:hybrid
: Entire hybrid sequence (default)seg1
: Sequence 1 (if defined)seg2
: Sequence 2 (if defined)miRNA
: miRNA sequence of miRNA/target pair (if defined, else None)target
: Target sequence of miRNA/target pair (if defined, else None)annotate (
bool
, optional) -- Add name of components to fasta sequence identifier if present.allow_mirna_dimers (
bool
, optional) --IfTrue
, allow miRNA dimers to bereturned as miRNA sequence (the 5p segmentwill be selected as the "miRNA").
- to_fasta_str(mode: Literal['hybrid', 'seg1', 'seg2', 'mirna', 'target'] = 'hybrid', annotate: bool = True) str
Return nucleotide sequence as a fasta string.
- Parameters
mode (
str
, optional) --as withto_fasta_record()
method.annotate (
bool
, optional) -- Add name of components to fasta sequence identifier if present.
- classmethod from_line(line: str, hybformat_id: bool = False, hybformat_ref: bool = False) Self
Construct a HybRecord instance from a single-line hyb-format string.
The Hyb software package ([Travis2014]) records read-count information in the "id" field of the record, which can be read by setting
hybformat_id=True
. Additionally, the Hyb hOH7 database contains the segment type in the identifier of each reference in the 4th field, which can be read by settinghybformat_ref=True
.- Parameters
line (str) -- hyb-format string containing record information.
hybformat_id (
bool
, optional) -- IfTrue
, read count information from identifier in<read_number>_<read_count>
format.hybformat_ref (
bool
, optional) -- IfTrue
, read additional record information from identifier in<gene_id>_<transcript_id>_<gene_name>_<seg_type>
format.
- Returns
HybRecord
instance containing record information.
- classmethod from_fasta_records(seg1_record: None, seg2_record: None, hyb_id: Optional[str] = None, energy: Optional[Union[float, int, str]] = None, flags: Optional[Dict[str, Any]] = None) Self
Construct a HybRecord instance from two BioPython SeqRecord Objects.
Create artificial HybRecord from two SeqRecord Objects For the hybrid:
id:[seg1_record.id]--[seg2_record.id]
(overwritten by "id" parameter if provided)seq: seg1_record.seq + seg2_recordFor each segment:
FASTA_Sequence_ID -> segN_ref_nameFASTA_Description -> Flags: segN_det (Overwritten if segN_det flag is provided directly)Optional fields to add via function arguments:
hyb_idenergyflags- Parameters
seg1_record (SeqRecord) -- Biopython SeqRecord object containing information on the left/first/5p hybrid segment (seg1)
seg2_record (SeqRecord) -- Biopython SeqRecord object containing information on the right/second/3p hybrid segment (seg2)
hyb_id (
str
, optional) -- Identifier for the hyb record (overwrites generated id if provided)energy (
str
orfloat
, optional) -- Predicted energy of sequence folding in kcal/molflags (
dict
, optional) -- Dict with keys of flags for the record and their associated values. Any flags provided overwrite default-generated flags.
- Returns
HybRecord
instance containing record information.
- classmethod to_fields_header() Literal['id', 'seq', 'energy', 'seg1_ref_name', 'seg1_read_start', 'seg1_read_end', 'seg1_ref_start', 'seg1_ref_end', 'seg1_score', 'seg2_ref_name', 'seg2_read_start', 'seg2_read_end', 'seg2_ref_start', 'seg2_ref_end', 'seg2_score', 'flags']
Return a list of the fields in a
HybRecord
object.For use with the
to_fields()
method.
- classmethod to_csv_header(newline: bool = False) Literal['id,seq,energy,seg1_ref_name,seg1_read_start,seg1_read_end,seg1_ref_start,seg1_ref_end,seg1_score,seg2_ref_name,seg2_read_start,seg2_read_end,seg2_ref_start,seg2_ref_end,seg2_score,flags']
Return a comma-separated string representation of the fields in the record.
For use with the
to_csv()
method.- Parameters
newline (
bool
, optional) -- IfTrue
, end the returned string with a newline.
HybFile Class
- class hybkit.HybFile(path: str, *args: Any, hybformat_id: Optional[bool] = None, hybformat_ref: Optional[bool] = None, from_file_like: bool = False, **kwargs: Any)
Wrapper for a hyb-format text file which returns entries (lines) as HybRecord objects.
- Parameters
path (str) -- Path to text file to open as hyb-format file.
*args -- Arguments passed to
open()
function to open a text file for reading/writing.hybformat_id (
bool
, optional) -- IfTrue
, during parsing of lines read count information from identifier in<read_number>_<read_count>
format. Defaults to value insettings['hybformat_id']
.hybformat_ref (
bool
, optional) -- IfTrue
, during parsing of lines read additional record information from identifier in<gene_id>_<transcript_id>_<gene_name>_<seg_type>
format. Defaults to value insettings['hybformat_ref']
.from_file_like (
bool
, optional) -- IfTrue
, the first argument is treated as a file-like object (such as io.StringIO or gzip.GzipFile) and the remaining positional arguments are ignored. (Default False``)**kwargs -- Keyword arguments passed to
open()
function to open a text file for reading/writing.
- Variables
- settings = {'hybformat_id': False, 'hybformat_ref': False}
Class-level settings. See
hybkit.settings.HybFile_settings_info
for descriptions.
- write_record(write_record: HybRecord) None
Write a HybRecord object to file as a Hyb-format string.
Unlike the file.write() method, this method will add a newline to the end of each written record line.
- Parameters
write_record (HybRecord) -- Record to write.
- write_records(write_records: Iterable[HybRecord]) None
Write a sequence of HybRecord objects as hyb-format lines to the Hyb file.
Unlike the file.writelines() method, this method will add a newline to the end of each written record line.
- write(*_args, **_kwargs) None
Implement no-op / error for "write" method to catch errors.
Use
write_record()
orwrite_fh()
instead.
- classmethod open(path: str, *args: Any, hybformat_id: Optional[bool] = None, hybformat_ref: Optional[bool] = None, **kwargs: Any) Self
Open a path to a text file using
open()
and return a HybFile object.Arguments match those of the Python3 built-in
open()
function and are passed directly to it.This method is provided as a convenience function for drop-in replacement of the built-in
open()
function.Specific keyword arguments are provided for HybFile-specific settings:
- Parameters
path (str) -- Path to file to open.
hybformat_id (
bool
, optional) -- IfTrue
, during parsing of lines read count information from identifier in<read_number>_<read_count>
format. Defaults to value insettings['hybformat_id']
.hybformat_ref (
bool
, optional) -- IfTrue
, during parsing of lines read additional record information from identifier in<gene_id>_<transcript_id>_<gene_name>_<seg_type>
format. Defaults to value insettings['hybformat_ref']
.
- Example usage:
with HybFile.open('path/to/file.hyb', 'r') as hyb_file: for record in hyb_file: print(record)
FoldRecord Class
- class hybkit.FoldRecord(id: str, seq: str, fold: str, energy: Optional[Union[float, int, str]] = None, seq_type: Optional[Literal['static', 'dynamic']] = None)
Class for storing secondary structure (folding) information for a nucleotide sequence.
This class supports the following file types: (Data courtesy of [Gay2018])
- Example:
34_151138_MIMAT0000076_MirBase_miR-21_microRNA_1_19-... TAGCTTATCAGACTGATGTTAGCTTATCAGACTGATG .....((((((.((((((......)))))).)))))) (-11.1)
- Example:
41 dG = -8 dH = -93.9 seq1_name-seq2_name 1 A 0 2 0 1 0 0 2 G 1 3 0 2 0 0 ... ... ... 40 G 39 41 11 17 39 41 41 T 40 0 10 18 40 0
A minimum amount of data necessary for a FoldRecord object is a sequence identifier, a genomic sequence, and its fold representation.
Two types of FoldRecord objects are supported, 'static' and 'dynamic'. Static FoldRecord objects are those where the 'seq' attribute matches exactly to the corresponding
HybRecord.seq
attribute (where applicable). Dynamic FoldRecord objects are those whereFoldRecord.seq
is reconstructed from aligned regions of aHybRecord.seq
chimeric read: Longer for chimeras with overlapping alignments, shorter for chimeras with gapped alignments.Overlapping Alignment Example:
Static: seg1: 1111111111111111111111 seg2: 222222222222222222222 seq: TAGCTTATCAGACTGATGTTTTAGCTTATCAGACTGATG Dynamic: seg1: 1111111111111111111111 seg2: 222222222222222222222 seq: TAGCTTATCAGACTGATGTTTTTTTTAGCTTATCAGACTGATG
Gapped Alignment Example:
Static: seg1: 1111111111111111 seg2: 222222222222222222 seq: TTAGCTTATCAGACTGATGTTAGCTTATCAGACTGATG Dynamic: seg1: 1111111111111111 seg2: 222222222222222222 seq: AGCTTATCAGACTGATTAGCTTATCAGACTGATG
Dynamic sequences are found in the Hyb program *_hybrids_ua.hyb file type. This is primarily relevant in error-checking when setting the
HybRecord.set_fold_record()
method.When the 'static' FoldRecord type is used, the following methods are used for
HybRecord.fold_record
error-checking:When the 'dynamic' FoldRecord type is used, the following methods are used for
HybRecord.fold_record
error-checking:- Parameters
id (str) -- Identifier for record
seq (str) -- Nucleotide sequence of record.
fold (str) -- Fold representation of record.
energy (
str
orfloat
, optional) -- Energy of folding for record.seq_type (
str
, optional) -- Expect sequence to be 'static' (match exactly to corresponding HybRecord.seq), or 'dynamic' (construct from pieces of HybRecord.seq). if not provided, defaults to~settings['seq_type']
setting. Seehybkit.settings.FoldRecord_settings_info
for descriptions.
- Variables
id (str) -- Sequence Identifier (often seg1name-seg2name)
seq (str) -- Genomic Sequence
fold (str) -- Dot-bracket Fold Representation, '(', '.', and ')' characters
energy (str) -- Predicted energy of folding
seq_type (str) -- Whether sequence is 'static' or 'dynamic' (Default: 'static'; see Args for details)
- settings = {'allowed_mismatches': 0, 'error_mode': 'raise', 'fold_placeholder': '.', 'seq_type': 'static'}
Class-level settings. See
hybkit.settings.FoldRecord_settings_info
for descriptions.
- to_vienna_lines(newline: bool = True) List[str]
Return a list of lines for the record in vienna format.
See (Vienna File Format).
- Parameters
newline (
bool
, optional) -- Add newline character to the end of each returned line. (Default: True)
- to_vienna_string(newline: bool = True) str
Return a 3-line string for the record in vienna format.
See (Vienna File Format).
- Parameters
newline (
bool
, optional) -- Terminate the returned string with a newline character. (Default: True)
- count_hyb_record_mismatches(hyb_record: HybRecord) int
Count mismatches between
hyb_record.seq
andfold_record.seq
.Uses
static_count_hyb_record_mismatches()
ifseq_type
isstatic
, ordynamic_count_hyb_record_mismatches()
ifseq_type
isdynamic
.- Parameters
hyb_record (HybRecord) -- hyb_record for comparison.
- static_count_hyb_record_mismatches(hyb_record: HybRecord) int
Count mismatches between
hyb_record.seq
andfold_record.seq
.- Parameters
hyb_record (HybRecord) -- hyb_record for comparison.
- dynamic_count_hyb_record_mismatches(hyb_record: HybRecord) int
Count mismatches between hyb_record.seq and dynamic fold_record.seq.
- Parameters
hyb_record (HybRecord) -- hyb_record for comparison
- matches_hyb_record(hyb_record: HybRecord, allowed_mismatches: Optional[int] = None) bool
Return
True
if self.seq and hyb_record.seq mismatches are <= allowed_mismatches.- Parameters
hyb_record (HybRecord) -- hyb_record to compare.
allowed_mismatches (
int
, optional) -- Number of mismatches allowed for a match. If not provided, defaults to the option insettings['allowed_mismatches']
.
- ensure_matches_hyb_record(hyb_record: HybRecord, allowed_mismatches: Optional[int] = None) None
Ensure self.seq matches hyb_record.seq, else raise an error.
- Parameters
hyb_record (HybRecord) -- hyb_record to compare.
allowed_mismatches (
int
, optional) -- Number of mismatches allowed for a match. If not provided, defaults to the option insettings['allowed_mismatches']
.
- classmethod from_vienna_lines(record_lines: List[str], error_mode: Optional[Literal['raise', 'warn_return', 'return']] = None, seq_type: Optional[Literal['static', 'dynamic']] = None) Union[Tuple[None, str], Tuple[Literal['NOFOLD'], str], Tuple[Literal['NOENERGY'], str], Self]
Construct instance from a list of 3 strings of vienna-format ([ViennaFormat]) lines.
See Vienna File Format for more details.
- classmethod from_vienna_string(record_string: str, error_mode: Optional[Literal['raise', 'warn_return', 'return']] = None, seq_type: Optional[Literal['static', 'dynamic']] = None) Union[Tuple[None, str], Tuple[Literal['NOFOLD'], str], Tuple[Literal['NOENERGY'], str], Self]
Construct instance from a string representing 3 vienna-format ([ViennaFormat]) lines.
See Vienna File Format for more details.
- classmethod from_ct_lines(record_lines: List[str], error_mode: Optional[Literal['raise', 'warn_return', 'return']] = None, seq_type: Optional[Literal['static', 'dynamic']] = None) Union[Tuple[None, str], Tuple[Literal['NOFOLD'], str], Tuple[Literal['NOENERGY'], str], Self]
Create a FoldRecord from a list of record lines in ".ct" format ([CTFormat]).
See CT File Format for more details.
Warning
This method is in beta stage, and is not well-tested.
- classmethod from_ct_string(record_string: str, error_mode: Optional[Literal['raise', 'warn_return', 'return']] = None, seq_type: Optional[Literal['static', 'dynamic']] = None) Union[Tuple[None, str], Tuple[Literal['NOFOLD'], str], Tuple[Literal['NOENERGY'], str], Self]
Create a FoldRecord entry from a multi-line string from ".ct" format ([CTFormat]).
See CT File Format for more details.
Warning
This method is in beta stage, and is not well-tested.
- Parameters
record_string (str) -- String containing lines of ct record
ViennaFile Class
- class hybkit.ViennaFile(*args: Any, seq_type: Optional[Literal['static', 'dynamic']] = None, error_mode: Optional[Literal['raise', 'warn_return', 'return']] = None, from_file_like: bool = False, **kwargs: Any)
Vienna file wrapper that returns vienna-format file lines as FoldRecord objects.
See Vienna File Format for more information.
- Parameters
seq_type (
str
, optional) -- Type of FoldRecord to return:static
, ordynamic
(if not provided, usesFoldRecord.settings['seq_type']
).error_mode (
str
, optional) -- String representing the error mode. If None, defaults to the value set insettings['error_mode']
. Options: "raise": Raise an error when encountered and exit program; "warn_return": Print a warning and return the error_value; "return": Return the error value with no warnings.from_file_like (
bool
, optional) -- If True, treat the first argument as a file-like object (such as io.StringIO or gzip.GzipFile) and the remaining positional arguments are ignored (DefaultFalse
).*args -- Passed to
open()
.**kwargs -- Passed to
open()
.
- Variables
Warning
Occasionally fold files can be poorly-formatted. In that case, this iterator attempts error-catching but this is not always successful so verbose error modes are encouraged.
- read_record(override_error_mode: Optional[Literal['raise', 'warn_return', 'return']] = None) Union[Tuple[None, str], Tuple[Literal['NOFOLD'], str], Tuple[Literal['NOENERGY'], str], Self]
Read next three lines and return output as FoldRecord object.
- Parameters
override_error_mode (str) -- Override the error_mode set in the
ViennaFile
object. See the ViennaFile Constructor for more information on allowed error modes.
- classmethod open(path: str, *args: Any, **kwargs: Any) Self
Open a path to a text file using
open()
and return relevant file object.Arguments match those of the Python3 built-in
open()
function and are passed directly to it.This method is provided as a convenience function for drop-in replacement of the built-in
open()
function.Specific keyword arguments are provided for fold-file-specific settings:
- Parameters
path (str) -- Path to file to open.
seq_type (
str
, optional) -- Type of FoldRecord to return:static
, ordynamic
(if not provided, usesFoldRecord.settings['seq_type']
).error_mode (
str
, optional) -- String representing the error mode. If None, defaults to the value set insettings['error_mode']
. Options: "raise": Raise an error when encountered and exit program; "warn_return": Print a warning and return the error_value; "return": Return the error value with no warnings.*args -- Passed directly to
open()
.**kwargs -- Passed directly to
open()
.
- Returns
HybFile
object.
- read_records() List[FoldRecord]
Return list of all
FoldRecord
objects for this file type.
- settings = {}
Class-level settings. See
hybkit.settings.FoldFile_settings_info
for descriptions.
- write_record(write_record: FoldRecord) None
Write a FoldRecord object for this file type.
Unlike the file.write() method, this method will add a newline to the end of each written record line.
- Parameters
write_record (
FoldRecord
) --FoldRecord
objects to write.
- write_records(write_records: Iterable[FoldRecord]) None
Write a sequence of FoldRecord objects for this file type.
Unlike the file.writelines() method, this method will add a newline to the end of each written record line.
- Parameters
write_records (list) -- List of
FoldRecord
objects to write.
CtFile Class
- class hybkit.CtFile(*args: Any, seq_type: Optional[Literal['static', 'dynamic']] = None, error_mode: Optional[Literal['raise', 'warn_return', 'return']] = None, from_file_like: bool = False, **kwargs: Any)
Ct file wrapper that returns ".ct" file lines as FoldRecord objects.
See CT File Format for more information.
Warning
This class is in beta stage, and is not well-tested.
- Parameters
seq_type (
str
, optional) -- Type of FoldRecord to return:static
, ordynamic
(if not provided, usesFoldRecord.settings['seq_type']
).error_mode (
str
, optional) -- String representing the error mode. If None, defaults to the value set insettings['error_mode']
. Options: "raise": Raise an error when encountered and exit program; "warn_return": Print a warning and return the error_value; "return": Return the error value with no warnings.from_file_like (
bool
, optional) -- If True, treat the first argument as a file-like object (such as io.StringIO or gzip.GzipFile) and the remaining positional arguments are ignored (DefaultFalse
).*args -- Passed to
open()
.**kwargs -- Passed to
open()
.
- Variables
Warning
Occasionally fold files can be poorly-formatted. In that case, this iterator attempts error-catching but this is not always successful so verbose error modes are encouraged.
- read_record() Union[Tuple[None, str], Tuple[Literal['NOFOLD'], str], Tuple[Literal['NOENERGY'], str], Self]
Return the next CT record as a
FoldRecord
object.Call next(self.fh) to return the first line of the next entry. Determine the expected number of following lines in the entry, and read that number of lines further. Return lines as a FoldRecord object.
- write_record = None
CtFile Record Writing Not Implemented
- write_records = None
CtFile Record Writing Not Implemented
- classmethod open(path: str, *args: Any, **kwargs: Any) Self
Open a path to a text file using
open()
and return relevant file object.Arguments match those of the Python3 built-in
open()
function and are passed directly to it.This method is provided as a convenience function for drop-in replacement of the built-in
open()
function.Specific keyword arguments are provided for fold-file-specific settings:
- Parameters
path (str) -- Path to file to open.
seq_type (
str
, optional) -- Type of FoldRecord to return:static
, ordynamic
(if not provided, usesFoldRecord.settings['seq_type']
).error_mode (
str
, optional) -- String representing the error mode. If None, defaults to the value set insettings['error_mode']
. Options: "raise": Raise an error when encountered and exit program; "warn_return": Print a warning and return the error_value; "return": Return the error value with no warnings.*args -- Passed directly to
open()
.**kwargs -- Passed directly to
open()
.
- Returns
HybFile
object.
- read_records() List[FoldRecord]
Return list of all
FoldRecord
objects for this file type.
- settings = {}
Class-level settings. See
hybkit.settings.FoldFile_settings_info
for descriptions.
HybFoldIter Class
- class hybkit.HybFoldIter(hybfile_handle: HybFile, foldfile_handle: FoldFile, combine: bool = False, iter_error_mode: Optional[Literal['raise', 'warn_return', 'warn_skip', 'skip', 'return']] = None)
Iterator for simultaneous iteration over a
HybFile
andFoldFile
object.This class provides an iterator to iterate through a
HybFile
and one of aViennaFile
, orCtFile
simultaneously to return aHybRecord
andFoldRecord
.Basic error checking / catching is performed based on the value of the
~settings['error_mode']
setting.- Parameters
hybfile_handle (HybFile) -- HybFile object for iteration
foldfile_handle (
ViennaFile
orCtFile
) --ViennaFile
orCtFile
object for iterationcombine (
bool
, optional) -- Use HybRecord.set_fold_record(FoldRecord) and return only the HybRecord.iter_error_mode (str, optional) -- Error mode to use for reading
FoldRecord
objects. If not set, defaults to the value insettings['iter_error_mode']
.
- Returns
- settings = {'error_checks': ['hybrecord_indel', 'foldrecord_nofold', 'max_mismatch', 'energy_mismatch'], 'iter_error_mode': 'warn_skip', 'max_sequential_skips': 100}
Class-level settings. See
settings.HybFoldIter_settings_info
for descriptions.
hybkit.type_finder
hybkit TypeFinder Class.
This module contains the TypeFinder class to work with HybRecord
to
parse sequence identifiers to identify sequence type.
- class hybkit.type_finder.TypeFinder
Class for parsing identifiers to identify sequence 'type'.
Designed to be used by the
hybkit.HybRecord
- Variables
params (dict) -- Stored parameters for string parsing, where applicable.
- find_with_params = None
Placeholder for storing active method, set with
set_method()
(seeset_method()
for details).
- params = None
Placeholder for parameters for active method, set with
set_method()
(seeset_method()
for details).
- default_method = 'hybformat'
Default method assigned using
check_set_method()
- methods = {'hybformat': 'method_hybformat', 'id_map': 'method_id_map', 'string_match': 'method_string_match'}
Dict of provided methods available to assign segment types
'hybformat'
'string_match'
'id_map'
- param_methods = {'hybformat': None, 'id_map': 'make_id_map_params', 'string_match': 'make_string_match_params'}
Dict of param generation methods for type finding methods
'hybformat'
'N/A'
'string_match'
'id_map'
- param_methods_needs_file = {'hybformat': False, 'id_map': True, 'string_match': True}
Dict of whether parameter generation methods need an input file
'hybformat'
'string_match'
'id_map'
- classmethod set_method(method: str, params: Optional[Dict[str, Any]] = None) None
Select method to use when finding types.
Available methods are listed in
methods
.
- classmethod method_is_set() bool
Return whether a TypeFinder method has been set.
Methods should be set with
set_method()
.- Returns
True if a method has been set, False otherwise.
- Return type
- classmethod check_set_method() None
If no TypeFinder method set, set as
default_method
.
- classmethod find(seg_props: Dict[str, Union[float, int, str]]) Optional[str]
Find type of segment using
TypeFinder.find_custom_method()
.If a TypeFinder method has been set with
set_method()
. use the current parameters set inparams
to find the type of the provided segment by calling:seg_type = :meth:`TypeFinder.find_custom_method`(seg_props, :attr`TypeFinder.params`)
- Parameters
seg_props (dict) --
seg_props
fromhybkit.HybRecord
- Returns
Type of the provided segment, or None if a type cannot be identified.
- Return type
- classmethod set_custom_method(method: Callable, params: Optional[dict] = None) None
Set the method for use to find seg types.
This method is for providing a custom function. To use the included functions, use
set_method()
. Custom functions provided must have the signature:seg_type = custom_method(self, seg_props, params)
This function should return the string of the assigned segment type if found, or a None object if the type cannot be found. It can also take a dictionary in the "params" argument that specifies additional or dynamic search properties, as desired.
- Parameters
method (method) -- Method to set for use.
params (dict, optional) -- dict of custom parameters to set for use.
- static method_hybformat(seg_props: Dict[str, Union[float, int, str]], params: Optional[dict] = None) Optional[str]
Return the type of the provided segment, or None if segment cannot be identified.
This method works with sequence / alignment mapping identifiers in the format of the reference database provided by the Hyb Software Package, specifically identifiers of the format:
<gene_id>_<transcript_id>_<gene_name>_<seg_type>
This method returns the last component of the identifier, split by "_", as the identified sequence type. (returns
None
if the segment identifier does not contain "_").Example
"MIMAT0000076_MirBase_miR-21_microRNA" ---> "microRNA".
- Parameters
seg_props (dict) --
seg_props
fromhybkit.HybRecord
params (dict, optional) -- Unused in this method.
- static method_string_match(seg_props: Dict[str, Union[float, int, str]], params: Optional[dict] = None) Optional[str]
Return the type of the provided segment, or None if unidentified.
This method attempts to find a string matching a specific pattern within the identifier of the aligned segment. Search options include "startswith", "contains", "endswith", and "matches", and returns the first type matching the criteria. The required params dict should contain a key for each desired search type, with a list of 2-tuples for each search-string with assigned-type.
Example
params = {'endswith': [('_miR', 'microRNA'), ('_trans', 'mRNA') ]}
This dict can be generated with the associated
make_string_match_params()
method and an associated csv legend file with format:#comment line #search_type,search_string,seg_type endswith,_miR,microRNA endswith,_trans,mRNA
- static make_string_match_params(legend_file: str) dict
Read csv and return a dict of search parameters for
method_string_match()
.The my_legend.csv file should have the format:
#comment line #search_type,search_string,seg_type endswith,_miR,microRNA endswith,_trans,mRNA
Search_type options include "startswith", "contains", "endswith", and "matches" The produced dict object contains a key for each search type, with a list of 2-tuples for each search-string and associated segment-type.
For example:
{'endswith': [('_miR', 'microRNA'), ('_trans', 'mRNA') ]}
- static method_id_map(seg_props: Dict[str, Union[float, int, str]], params: Optional[dict] = None) Optional[str]
Return the type of the provided segment or None if it cannot be identified.
This method checks to see if the identifier of the segment is present in the params dict. params should be formatted as a dict with keys as sequence identifier names, and the corresponding type as the respective values.
Example
params = {'MIMAT0000076_MirBase_miR-21_microRNA': 'microRNA', 'ENSG00000XXXXXX_NR003287-2_RN28S1_rRNA': 'rRNA'}
This dict can be generated with the associated
make_id_map_params()
method.
- static make_id_map_params(mapped_id_files: List[str]) dict
Read file(s) into a mapping of sequence identifiers.
This method reads one or more files into a dict for use with the
method_id_map()
method. The method requires passing a file path (or list/tuple of file paths) of mapped_id_files. Files listed in the mapped_id_files argument should have the format:#comment line #seg_id,seg_type segA_unique_id,segA_type segB_unique_id,segB_type
hybkit.analysis
Functions for analysis of HybRecord and FoldRecord objects.
Analysis
- class hybkit.analysis.Analysis(analysis_types: Union[Literal['energy', 'type', 'mirna', 'target'], List[Literal['energy', 'type', 'mirna', 'target']]], name: Optional[str] = None, quant_mode: Optional[Literal['single', 'reads', 'records']] = None)
Class for analysis of hybkit HybRecord and FoldRecord objects.
This class contains multiple conceptual analyses for HybRecord/FoldRecord Data:
This class used by selecting the desired analysis types on object initialization. Analyses are performed either by using either the
add_record()
or theadd_all_records()
methods. The results of the analysis are then available through theget_all_results()
,get_analysis_results()
,get_specific_result()
, andplot_analysis_results()
methods, which can return (or plot) the results of all analyses or of a specific subset of analyses.Details for each respective analysis are provided here:
Energy Analysis:
This analysis evaluates the energy of each
HybRecord
object and provides a binned-histogram of all energy values represented.- Output Results:
energy_analysis_count
(int
): Count of energy values evaluatedhas_energy_val
(int
): Count of hyb_records with an energy valueno_energy_val
(int
): Count of hyb_records without an energy valueenergy_min
(float
): Minimum energy valueenergy_max
(float
): Maximum energy valueenergy_mean
(float
): Mean energy valueenergy_std
(float
): Standard deviation of energy valuesbinned_energy_vals
(Counter
): Counter with integer keys of energy values fromenergy_min
toenergy_max
storing the count of any hyb_records with energy values that fall within that range (rounded to the next highest integer (e.g. -12.5 -> -12).
Type Analysis:
This analysis evaluates the counts of each type of segment included in the
HybRecord
objects. The types of segments are determined by the seg1_type and seg2_type flags, which are set by thehybkit.HybRecord.eval_types()
method.Requirements:
seg1_type and seg2_type flags must be set for each HybRecord, (can be done byhybkit.HybRecord.eval_types()
).- Output Results:
types_analysis_count
(int
): Count of hybrid types analyzedhybrid_types
(Counter
): Counter containing annotated types of seg1 and seg (in original 5p / 3p order)reordered_hybrid_types
(Counter
): Counter containing annotated types of seg1 and seg2. This is provided in "sorted" order, where types are sorted alphabetically (independent of 5p / 3p position).mirna_hybrid_types
(Counter
): Counter containing annotated types of seg1 and seg2. This is provided in "miRNA-prime" order, where a miRNA type is always listed before other types, and then remaining types are sorted alphabetically (independent of 5p / 3p position).seg1_types
(Counter
): Counter containing annotated type of segment in position seg1seg2_types
(Counter
): Counter containing annotated type of segment in position seg2all_seg_types
(Counter
): Counter containing position-independent annotated types
miRNA Analysis:
Analysis of miRNA segments in hybrids.
The mirna_analysis provides an analysis of what miRNA types are present in the hyb records. If a miRNA dimer is present in a hybrid, this is counted in
mirna_dimers
. If a single miRNA is present in a hybrid, this is counted inmirnas_5p
ormirnas_3p
depending on the miRNA location.- Requirements:
- mirna_seg flag must be set for each HybRecord (can be done by
hybkit.HybRecord.eval_mirna()
). - Output Results:
mirna_analysis_count
(int
): Count of miRNA hybrids analyzedmirnas_5p
(int
): Count of 5p miRNAs detectedmirnas_3p
(int
): Count of 3p miRNAs detectedmirna_dimers
(int
): Count of miRNA dimers (5p + 3p) detectednon_mirna
(int
): Count of non-miRNA hybrids detectedhas_mirna
(int
): Hybrids with 5p, 3p, or both as miRNA
Target Analysis:
Analysis of targets in miRNA-containing hybrids.
The target analysis provides an analysis of what annotated sequences and sequence types are targeted by any miRNA within the hyb records. If a miRNA is not present in a hybrid, the hybrid is not included in the analysis. If a miRNA dimer is present in a hybrid, the 5p miRNA is used for the analysis, and the 3p miRNA is considered the "target."
- Requirements:
- mirna_seg flag must be set for each HybRecord (can be done by
hybkit.HybRecord.eval_mirna()
). - Output Results:
Fold Analysis:
This analysis evaluates the predicted binding of miRNA within hyb records that contain a miRNA and have an associated
FoldRecord
object as the attributefold_record
. This includes an analysis and plotting of the predicted binding by position among the provided miRNA.- Requirements:
- The mirna_seg flag must be set for each HybRecord (can be done by
hybkit.HybRecord.eval_mirna()
).The fold_record attribute must be set for each HybRecord with a correspondingFoldRecord
object. This can be done using thehybkit.HybRecord.set_fold_record()
method. - Output Results:
fold_analysis_count
(int
): Count of miRNA fold predictions analyzedfolds_recorded
(int
): Count of fold predictions with a mirna foldmirna_nt_fold_counts
(Counter
) : Counter with keys of miRNA position index and values of number of miRNAs with a predicted bound state at that index.mirna_nt_fold_props
(Counter
) : Counter with keys of miRNA position index and values of proportion (0.0 - 1.0) of miRNAs with a predicted bound state at that index.fold_match_counts
(Counter
) : Counter with keys of count of predicted matches between miRNA and target with values of count of miRNAs with that number of predicted matches.
- Parameters
analysis_types (
str
orlist
ofstr
) -- Analysis types to performname (
str
, optional) -- Name of the analysisquant_mode (
str
, optional) -- Mode to use for record quantification. Options are "single": One count per record; "reads": If "read_count" flag is set, count all reads in record (else count 1); "records": if the "record_count" flag is set, count all individual records within combined record (else count 1). If not provided, defaults to the value inAnalysis.settings['quant_mode'].
- Variables
- settings = {'out_delim': ',', 'quant_mode': 'single'}
Class-level settings. See
hybkit.settings.Analysis_settings
for descriptions.
- analysis_options = ['energy', 'type', 'mirna', 'target', 'fold']
- add_hyb_record(hyb_record: HybRecord) None
Add a HybRecord object to the analysis.
- Parameters
hyb_record (
HybRecord
) -- HybRecord object to be added to the analysis.
- add_hyb_records(hyb_records: List[HybRecord], eval_types: bool = False, eval_mirna: bool = False) None
Add a list of HybRecord objects to the analysis.
- Parameters
hyb_records (
HybFile
orlist
ofHybRecord
) -- HybFile to iterate over, or iterable of HybRecord objects to be added to the analysis.eval_types (bool) -- If
True
, evaluate the hybrid type of the HybRecord before adding it to the analysis usinghybkit.HybRecord.eval_types()
.eval_mirna (bool) -- If
True
, evaluate the miRNA segment of the HybRecord before adding it to the analysis usinghybkit.HybRecord.eval_mirna()
.
- get_all_results() dict
Return a dictionary with all results for all active analyses.
See Analyses for details on the results for each analysis type.
- Returns
- Dictionary with keys of analysis type and values of
dictionaries with results for that analysis type.
- Return type
- get_analysis_results(analysis: Literal['energy', 'type', 'mirna', 'target']) Dict
Return a dictionary with all results for a specific analysis.
See Analyses for details on the results for each analysis type.
- get_specific_result(result_key: str) Any
Get a specific result from the analysis.
See Analyses for details on the results for each analysis type.
- Parameters
result_key (str) -- Result key to return from one of the enabled analyses.
- Returns
Result value for the specified result key.
- get_analysis_delim_str(analysis: Optional[Literal['energy', 'type', 'mirna', 'target']] = None, out_delim: Optional[str] = None) str
Return a delimited string containing the results of the analysis.
See Analyses for details on the results for each analysis type.
- Parameters
analysis (
str
orlist
ofstr
) -- Analysis type for return results. If not provided, return the results for all active analyses.out_delim (str) -- Delimiter to use for output. If not provided, defaults to the value in
settings['out_delim']
.
- write_analysis_delim_str(out_file_name: Optional[str] = None, analysis: Optional[Union[Literal['energy', 'type', 'mirna', 'target'], List[Literal['energy', 'type', 'mirna', 'target']]]] = None, out_delim: Optional[str] = None) None
Write the results of the analysis to a delimited text file.
See Analyses for details on the results for each analysis type.
- Parameters
out_file_name (str) -- Path to output file. If not provided, defaults to: ./<analysis_name>_<analysis>.csv if analysis/analyses provided, or ./<analysis_name>_multi_analysis.csv if no analysis/analyses provided.
analysis (
str
orlist
ofstr
) -- Analysis type for return results. If not provided, return the results for all active analyses.out_delim (str) -- Delimiter to use for output. If not provided, defaults to the value in
settings['out_delim']
.
- write_analysis_results_special(out_basename: Optional[str] = None, analysis: Optional[Union[Literal['energy', 'type', 'mirna', 'target'], List[Literal['energy', 'type', 'mirna', 'target']]]] = None, out_delim: Optional[str] = None) List[str]
Write the results of the analyses to specialized text files.
See Analyses for details on the results for each analysis type.
- Parameters
out_basename (str) -- Path for basename of output file. Files will be renamed using the provided path as the base name. If not provided, defaults to: ./<analysis_name>_<analysis> if
name
is set, or ./Analysis_multi_<analysis> if name not set.analysis (
str
orlist
ofstr
) -- Analysis type to write results files for. If not provided, write results files for all active analyses.out_delim (str) -- Delimiter to use for output where applicable. If not provided, defaults to the value in
settings['out_delim']
.
- plot_analysis_results(out_basename: Optional[str] = None, analysis: Optional[Union[Literal['energy', 'type', 'mirna', 'target'], List[Literal['energy', 'type', 'mirna', 'target']]]] = None) List[str]
Plot the results of the analyses.
See Analyses for details on the results for each analysis type.
- key = 'fold'
hybkit.plot
Methods for plotting analyses of HybRecord and FoldRecord objects.
- hybkit.plot.COLOR_DICT = {'Blue': '#0072B2', 'Bluish Green': '#009E73', 'Orange': '#E69F00', 'Reddish Purple': '#CC79A7', 'Sky Blue': '#56B4E9', 'Vermilion': '#D55E00', 'Yellow': '#F0E442'}
Default Colors for colored plots: Colors selected based on "Points of view: Color blindness" by Bang Wong, Nature Methods, 2011. Colors in RGB nomenclature (1-255): Black (0,0,0), Orange (230,159,0), Sky Blue (86,180,233), Bluish Green (0,158,115), Yellow (240,228,66), Blue (0,114,178), Vermilion (213,94,0), Reddish Purple (204,121,167)
- hybkit.plot.COLOR_LIST = ['#0072B2', '#D55E00', '#009E73', '#CC79A7', '#E69F00', '#56B4E9', '#F0E442']
List of default colors for colored plots.
- hybkit.plot.ENERGY_HIST_RC_PARAMS = {'axes.labelweight': 'bold', 'axes.titlepad': 15, 'axes.titlesize': 'large', 'axes.titleweight': 'bold', 'figure.dpi': 1200, 'figure.figsize': [6.4, 4.8]}
Default mpl rcParams for energy analysis histograms.
- hybkit.plot.TYPE_PIE_SINGLE_RC_PARAMS = {'axes.labelweight': 'bold', 'axes.titlepad': 15, 'axes.titleweight': 'bold', 'figure.dpi': 1200, 'figure.figsize': [6.4, 4.8]}
Default mpl rcParams for type analysis pie charts.
- hybkit.plot.TYPE_PIE_DUAL_RC_PARAMS = {'axes.labelweight': 'bold', 'axes.titlepad': 15, 'axes.titleweight': 'bold', 'figure.dpi': 1200, 'figure.figsize': (8, 4.8)}
Default mpl rcParams for type analysis pie charts.
- hybkit.plot.TARGET_PIE_RC_PARAMS = {'axes.labelweight': 'bold', 'axes.titlepad': 15, 'axes.titleweight': 'bold', 'figure.dpi': 1200, 'figure.figsize': (9.6, 4.8)}
Default mpl rcParams for target analysis pie charts.
- hybkit.plot.FOLD_MATCH_HIST_RC_PARAMS = {'axes.labelweight': 'bold', 'axes.titlepad': 15, 'axes.titlesize': 'large', 'axes.titleweight': 'bold', 'figure.dpi': 1200, 'figure.figsize': [6.4, 4.8]}
Default mpl rcParams for fold match analysis histograms.
- hybkit.plot.FOLD_NT_COUNTS_HIST_RC_PARAMS = {'axes.labelweight': 'bold', 'axes.titlepad': 15, 'axes.titlesize': 'large', 'axes.titleweight': 'bold', 'figure.dpi': 1200, 'figure.figsize': [6.4, 4.8]}
Default mpl rcParams for fold nt counts analysis histograms.
- hybkit.plot.PIE_DEFAULTS = {'COLORS': ['#0072B2', '#D55E00', '#009E73', '#CC79A7', '#E69F00', '#56B4E9', '#F0E442'], 'MIN_WEDGE_SIZE': 0.04, 'OTHER_THRESHOLD': 0.05, 'SETTINGS': {'autopct': '%1.1f%%', 'counterclock': False, 'shadow': False, 'startangle': 90}}
Default Pie Chart Plot Settings.
- hybkit.plot.BAR_DEFAULTS = {'BAR_ALIGN': 'edge', 'BAR_EDGE_COLOR': None, 'BAR_WIDTH': 0.9}
Default Bar Chart Plot Settings.
- hybkit.plot.BAR_INT_DEFAULTS = {'BAR_ALIGN': 'center', 'BAR_EDGE_COLOR': None, 'BAR_WIDTH': 0.9}
Default Bar Chart of Integer Plot Settings.
- hybkit.plot.ENERGY_DEFAULTS = {'MIN_COUNT': 0, 'MIN_DENSITY': 0.0, 'XLABEL': 'Hybrid Gibbs Free Energy (kcal/mol)', 'YLABEL': 'Hybrid Count'}
Default Bar Chart Plot Settings for Energy Histograms.
- hybkit.plot.energy_histogram(results: Dict[str, Any], plot_file_name: str, title: str, name: Optional[str] = None, rc_params: Dict[str, Any] = {'axes.labelweight': 'bold', 'axes.titlepad': 15, 'axes.titlesize': 'large', 'axes.titleweight': 'bold', 'figure.dpi': 1200, 'figure.figsize': [6.4, 4.8]}, bar_params: Dict[str, Any] = {'BAR_ALIGN': 'edge', 'BAR_EDGE_COLOR': None, 'BAR_WIDTH': 0.9}) None
Plot histogram of hybrid energies from an
Analysis
fold analysis.- Parameters
results (dict) -- Dictionary of energy counts from an
Analysis
fold analysis (Key:binned_energy_vals
).plot_file_name (str) -- Name of output file.
title (str) -- Title of plot.
name (
str
, optional) -- Name of analysis to be included in plot title.rc_params (
dict
, optional) -- Dictionary of mpl rcParams. Defaults toENERGY_HIST_RC_PARAMS
.bar_params (
dict
, optional) -- Dictionary of bar plot parameters. Defaults toBAR_DEFAULTS
.
- hybkit.plot.type_count(results: Counter, plot_file_name: str, title: str, name: Optional[str] = None, join_entries: bool = False, rc_params: Dict[str, Any] = {'axes.labelweight': 'bold', 'axes.titlepad': 15, 'axes.titleweight': 'bold', 'figure.dpi': 1200, 'figure.figsize': [6.4, 4.8]}) None
Plot pie chart of hybrid type counts from an
Analysis
type analysis.- Parameters
results (Counter) -- Counter Object of type counts from an
Analysis
type analysis.plot_file_name (str) -- Name of output file.
title (str) -- Title of plot.
name (
str
, optional) -- Name of analysis to be included in plot title.join_entries (
bool
, optional) -- If True, join two-tuple pairs into a single string for plot labels.rc_params (
dict
, optional) -- Dictionary of mpl rcParams. Defaults toTYPE_PIE_RC_PARAMS
.
- hybkit.plot.type_count_dual(results: Counter, plot_file_name: str, title: str, name: Optional[str] = None, join_entries: bool = False, rc_params: Dict[str, Any] = {'axes.labelweight': 'bold', 'axes.titlepad': 15, 'axes.titleweight': 'bold', 'figure.dpi': 1200, 'figure.figsize': (8, 4.8)}) None
Plot pie chart of hybrid type counts from an
Analysis
type analysis.- Parameters
results (Counter) -- Counter Object of type counts from an
Analysis
type analysis.plot_file_name (str) -- Name of output file.
title (str) -- Title of plot.
name (
str
, optional) -- Name of analysis to be included in plot title.join_entries (
bool
, optional) -- If True, join two-tuple pairs into a single string for plot labels.rc_params (
dict
, optional) -- Dictionary of mpl rcParams. Defaults toTYPE_PIE_RC_PARAMS
.
- hybkit.plot.target_count(*args, **kwargs) None
Plot pie chart of target counts from an
Analysis
type analysis.- Parameters
results (Counter) -- Counter Object of type counts from an
Analysis
type analysis.plot_file_name (str) -- Name of output file.
title (str) -- Title of plot.
name (
str
, optional) -- Name of analysis to be included in plot title.join_entries (
bool
, optional) -- If True, join two-tuple pairs into a single string for plot labels.rc_params (
dict
, optional) -- Dictionary of mpl rcParams. Defaults toTARGET_PIE_RC_PARAMS
.
- hybkit.plot.fold_match_counts_histogram(results: Counter, plot_file_name: str, title: str, name: Optional[str] = None, is_prop: bool = False, rc_params: Dict[str, Any] = {'axes.labelweight': 'bold', 'axes.titlepad': 15, 'axes.titlesize': 'large', 'axes.titleweight': 'bold', 'figure.dpi': 1200, 'figure.figsize': [6.4, 4.8]}, bar_params: Dict[str, Any] = {'BAR_ALIGN': 'center', 'BAR_EDGE_COLOR': None, 'BAR_WIDTH': 0.9}) None
Plot histogram of predicted miRNA/target match count.
- Parameters
results (Counter) -- Counter Object of match counts from an
Analysis
type analysis.plot_file_name (str) -- Name of output file.
title (str) -- Title of plot.
is_prop (
bool
, optional) -- If True, y axis is proportion.name (
str
, optional) -- Name of analysis to be included in plot title.rc_params (
dict
, optional) -- Dictionary of mpl rcParams. Defaults toFOLD_MATCH_HIST_RC_PARAMS
.bar_params (
dict
, optional) -- Dictionary of bar plot parameters. Defaults toBAR_INT_DEFAULTS
.
- hybkit.plot.fold_mirna_nt_counts_histogram(*args, **kwargs) None
Plot histogram of predicted miRNA/target match count.
- Parameters
results (Counter) -- Counter Object of match counts from an
Analysis
type analysis.plot_file_name (str) -- Name of output file.
title (str) -- Title of plot.
is_prop (
bool
, optional) -- If True, y axis is proportion.name (
str
, optional) -- Name of analysis to be included in plot title.rc_params (
dict
, optional) -- Dictionary of mpl rcParams. Defaults toFOLD_NT_COUNTS_HIST_RC_PARAMS
.bar_params (
dict
, optional) -- Dictionary of bar plot parameters. Defaults toBAR_INT_DEFAULTS
.
hybkit.settings
This module contains settings information for hybkit classes and methods.
- hybkit.settings.HYB_SUFFIXES = ['.hyb', '.Hyb', '.HYB']
Allowed suffixes for "Hyb" files.
- hybkit.settings.VIENNA_SUFFIXES = ['.vienna', '.Vienna', '.VIENNA']
Allowed suffixes for "Vienna" files.
- hybkit.settings.CT_SUFFIXES = ['.ct', '.Ct', '.CT']
Allowed suffixes for "Connection-Table" files.
- hybkit.settings.FOLD_SUFFIXES = ['.vienna', '.Vienna', '.VIENNA', '.ct', '.Ct', '.CT']
Allowed suffixes for "Vienna" and "Connection-Table" files.
- hybkit.settings.MIRNA_TYPES = ['miRNA', 'microRNA']
Default miRNA types for use in
mirna_analysis()
.
- hybkit.settings.HybRecord_settings_info
Information for settings of
HybRecord
class. Copied intoHybRecord_settings
for use at runtime.
hybkit.settings.HybRecord_settings_info = {
'allow_undefined_flags': {'Argp-Flag': None,
'Argp-Opts': {'const': True, 'nargs': '?'},
'Argp-Type': 'custom_bool_from_str',
'Def-Val': False,
'Desc.': 'Allow use of flags not defined in the '
'hybkit-specification order when reading and '
'writing hyb records. As the preferred '
'alternative to using this setting, the '
'--custom_flags argument can be be used to '
'supply custom allowed flags.'},
'allow_unknown_seg_types': {'Argp-Flag': None,
'Argp-Opts': {'const': True, 'nargs': '?'},
'Argp-Type': 'custom_bool_from_str',
'Def-Val': False,
'Desc.': 'Allow unknown segment types when assigning '
'segment types.'},
'custom_flags': {'Argp-Flag': None,
'Argp-Opts': {'nargs': '+'},
'Argp-Type': 'str',
'Def-Val': [],
'Desc.': 'Custom flags to allow in addition to those specified in '
'the hybkit specification.'},
'hyb_placeholder': {'Argp-Flag': None,
'Argp-Opts': {},
'Argp-Type': 'str',
'Def-Val': '.',
'Desc.': 'placeholder character/string for missing data in hyb '
'files.'},
'mirna_types': {'Argp-Flag': None,
'Argp-Opts': {'nargs': '+'},
'Argp-Type': 'str',
'Def-Val': ['miRNA', 'microRNA'],
'Desc.': '"seg_type" fields identifying a miRNA'},
'reorder_flags': {'Argp-Flag': None,
'Argp-Opts': {},
'Argp-Type': 'custom_bool_from_str',
'Def-Val': True,
'Desc.': 'Re-order flags to the hybkit-specification order when '
'writing hyb records.'}
}
- hybkit.settings.HybFile_settings_info
Information for settings of
HybFile
class. Copied intoHybFile_settings
for use at runtime.
hybkit.settings.HybFile_settings_info = {
'hybformat_id': {'Argp-Flag': None,
'Argp-Opts': {'const': True, 'nargs': '?'},
'Argp-Type': 'custom_bool_from_str',
'Def-Val': False,
'Desc.': 'The Hyb Software Package places further information in '
'the "id" field of the hybrid record that can be used to '
'infer the number of contained read counts. When set to '
'True, the identifiers will be parsed as: '
'"<read_id>_<read_count>"'},
'hybformat_ref': {'Argp-Flag': None,
'Argp-Opts': {'const': True, 'nargs': '?'},
'Argp-Type': 'custom_bool_from_str',
'Def-Val': False,
'Desc.': 'The Hyb Software Package uses a reference database '
'with identifiers that contain sequence type and other '
'sequence information. When set to True, all hyb file '
'identifiers will be parsed as: '
'"<gene_id>_<transcript_id>_<gene_name>_<seg_type>"'}
}
- hybkit.settings.FoldRecord_settings_info
Information for settings of
FoldRecord
class. Copied intoFoldRecord_settings
for use at runtime.
hybkit.settings.FoldRecord_settings_info = {
'allowed_mismatches': {'Argp-Flag': None,
'Argp-Opts': {},
'Argp-Type': 'int',
'Def-Val': 0,
'Desc.': 'For DynamicFoldRecords, allowed number of '
'mismatches with a HybRecord.'},
'error_mode': {'Argp-Flag': None,
'Argp-Opts': {'choices': ['raise', 'warn_return', 'return']},
'Argp-Type': 'str',
'Def-Val': 'raise',
'Desc.': 'Mode for handling errors during reading of HybFiles '
"(overridden by HybFoldIter.settings['iter_error_mode'] "
'when using HybFoldIter). Options: "raise": Raise an error '
'when encountered and exit program ; "warn_return": Print '
'a warning and return the error_value ; "return": Return '
'the error value with no program output. record is '
'encountered.'},
'fold_placeholder': {'Argp-Flag': None,
'Argp-Opts': {},
'Argp-Type': 'str',
'Def-Val': '.',
'Desc.': 'Placeholder character/string for missing data for '
'reading/writing fold records.'},
'seq_type': {'Argp-Flag': '-y',
'Argp-Opts': {'choices': ['static', 'dynamic']},
'Argp-Type': 'str',
'Def-Val': 'static',
'Desc.': 'Type of fold record object to use. Options: "static": '
'FoldRecord, requires an exact sequence match to be paired '
'with a HybRecord; "dynamic": DynamicFoldRecord, requires a '
'sequence match to the "dynamic" annotated regions of a '
'HybRecord, and may be shorter/longer than the original '
'sequence.'}
}
- hybkit.settings.FoldFile_settings_info
Information for settings of
FoldFile
class. Copied intoFoldFile_settings
for use at runtime.
hybkit.settings.FoldFile_settings_info = {
}
- hybkit.settings.HybFoldIter_settings_info
Information for settings of
HybFoldIter
class. Copied intoHybFoldIter_settings
for use at runtime.
hybkit.settings.HybFoldIter_settings_info = {
'error_checks': {'Argp-Flag': None,
'Argp-Opts': {'choices': ['hybrecord_indel',
'foldrecord_nofold',
'max_mismatch',
'energy_mismatch']},
'Argp-Type': 'str',
'Def-Val': ['hybrecord_indel',
'foldrecord_nofold',
'max_mismatch',
'energy_mismatch'],
'Desc.': 'Error checks for simultaneous HybFile and FoldFile '
'parsing. Options: "hybrecord_indel": Error for '
'HybRecord objects where one/both sequences have '
'insertions/deletions in alignment, which prevents '
'matching of sequences; "foldrecord_nofold": Error when '
'failure in reading a fold_record object; '
'"max_mismatch": Error when mismatch between hybrecord '
'and foldrecord sequences is greater than FoldRecord '
'"allowed_mismatches" setting; "energy_mismatch": Error '
'when a mismatch exists between HybRecord and FoldRecord '
'energy values.'},
'iter_error_mode': {'Argp-Flag': None,
'Argp-Opts': {'choices': ['raise',
'warn_return',
'warn_skip',
'skip',
'return']},
'Argp-Type': 'str',
'Def-Val': 'warn_skip',
'Desc.': 'Mode for handling errors found during error checks. '
'Overrides HybRecord "error_mode" setting when using '
'HybFoldIter. Options: "raise": Raise an error when '
'encountered; "warn_return": Print a warning and '
'return the value; "warn_skip": Print a warning and '
'continue to the next iteration; "skip": Continue to '
'the next iteration without any output; "return": '
'return the value without any error output;'},
'max_sequential_skips': {'Argp-Flag': None,
'Argp-Opts': {},
'Argp-Type': 'int',
'Def-Val': 100,
'Desc.': 'Maximum number of record(-pairs) to skip in a '
'row. Limited as several sequential skips '
'usually indicates an issue with record '
'formatting or a desynchronization between '
'files.'}
}
- hybkit.settings.Analysis_settings_info
Information for settings of
Analysis
class. Copied intoAnalysis_settings
for use at runtime.
hybkit.settings.Analysis_settings_info = {
'out_delim': {'Argp-Flag': None,
'Argp-Opts': {},
'Argp-Type': 'str',
'Def-Val': ',',
'Desc.': 'Delimiter-string to place between fields in analysis '
'output.'},
'quant_mode': {'Argp-Flag': None,
'Argp-Opts': {'choices': ['single', 'reads', 'records']},
'Argp-Type': 'str',
'Def-Val': 'single',
'Desc.': 'Method for counting records. Options: "single": Count '
'each record as a single entry; "reads": Use the number of '
'reads per hyb record as the count (may contain PCR '
'duplicates); "records": Count the number of records '
'represented by each hyb record entry (1 for "unmerged" '
'records, >= 1 for "merged" records)'}
}
- hybkit.settings.HybRecord_settings = {'allow_undefined_flags': False, 'allow_unknown_seg_types': False, 'custom_flags': [], 'hyb_placeholder': '.', 'mirna_types': ['miRNA', 'microRNA'], 'reorder_flags': True}
Settings for
HybRecord
, created fromHybRecord_settings_info
- hybkit.settings.HybFile_settings = {'hybformat_id': False, 'hybformat_ref': False}
Settings for
HybFile
, created fromHybFile_settings_info
- hybkit.settings.FoldRecord_settings = {'allowed_mismatches': 0, 'error_mode': 'raise', 'fold_placeholder': '.', 'seq_type': 'static'}
Settings for
FoldRecord
, created fromFoldRecord_settings_info
- hybkit.settings.FoldFile_settings = {}
Settings for
FoldFile
, created fromFoldFile_settings_info
- hybkit.settings.HybFoldIter_settings = {'error_checks': ['hybrecord_indel', 'foldrecord_nofold', 'max_mismatch', 'energy_mismatch'], 'iter_error_mode': 'warn_skip', 'max_sequential_skips': 100}
Settings for
HybFoldIter
, created fromHybFoldIter_settings_info
- hybkit.settings.Analysis_settings = {'out_delim': ',', 'quant_mode': 'single'}
Settings for
BaseAnalysis
, created fromAnalysis_settings_info
hybkit.util
This module contains helper functions for hybkit's command line scripts.
- hybkit.util.get_argparse_doc(docstring: str) str
Get the argparse description from a docstring.
- Parameters
docstring (str) -- A docstring.
- Returns
A string containing the argparse description.
- hybkit.util.dir_exists(dir_name: str) str
Check if a directory exists at the provided path (else raise), and return a normalized path.
- Parameters
dir_name (str) -- Name of directory to check for existence.
- Returns
A normalized version of the path passed to dir_name.
- hybkit.util.file_exists(file_name: str, required_suffixes: Optional[List[str]] = None) str
Check if a file exists at the provided path, and return a normalized path.
- Parameters
- Returns
A normalized version of the path passed to file_name.
- hybkit.util.hyb_exists(file_name: str) str
Check if a .hyb file exists at the provided path, and return a normalized path.
Wrapper for
file_exists()
that includes the required suffixes inhybkit.settings.HYB_SUFFIXES
.- Parameters
file_name (str) -- Name of file to check for existence.
- Returns
A normalized version of the path passed to file_name.
- hybkit.util.vienna_exists(file_name: str) str
Check if a .vienna file exists at the provided path, and return a normalized path.
Wrapper for
file_exists()
that includes the required suffixes inhybkit.settings.VIENNA_SUFFIXES
.- Parameters
file_name (str) -- Name of file to check for existence.
- Returns
A normalized version of the path passed to file_name.
- hybkit.util.ct_exists(file_name: str) str
Check if a .ct file exists at the provided path, and return a normalized path.
Wrapper for
file_exists()
that includes the required suffixes inhybkit.settings.CT_SUFFIXES
.- Parameters
file_name (str) -- Name of file to check for existence.
- Returns
A normalized version of the path passed to file_name.
- hybkit.util.fold_exists(file_name: str) str
Check if a fold-representing file exists at the provided path, and return a normalized path.
Wrapper for
file_exists()
that includes the required suffixes inhybkit.settings.FOLD_SUFFIXES
.- Parameters
file_name (str) -- Name of file to check for existence.
- Returns
A normalized version of the path passed to file_name.
- hybkit.util.out_path_exists(file_name: str) str
Check if the directory of the specified output path exists, and return a normalized path.
- Parameters
file_name (str) -- Name of path to an output file to check.
- Returns
A normalized version of the path passed to file_name.
- hybkit.util.make_out_file_name(in_file_name: str, name_suffix: str = 'out', in_suffix: str = '', out_suffix: str = '', out_dir: str = '', seg_sep: str = '_') str
Given an input file name, generate an output file name.
- Parameters
in_file_name (str) -- Name of input file as template.
name_suffix (str) -- Suffix to add to name before file type.
in_suffix (str) -- File type suffix on in_file_name (to remove).
out_suffix (str) -- File type suffix to add to final output file.
out_dir (str) -- Directory path in which to place output file.
seg_sep (str) -- Separator string between file name segments.
- Returns
An output file path based on the input file template.
- hybkit.util.validate_args(args: Namespace, parser: Optional[ArgumentParser] = None) bool
Check supplied arguments to make sure there are no hidden contradictions.
- Current checks:
If explicit output file names supplied, be sure that they match the number of input files provided.
If fold files provided, make sure that they match the number of input hyb files provided.
- Parameters
args (argparse.Namespace) -- The arguments produced by argparse.
parser (argparse.ArgumentParser, optional) -- Argparse parser object to use for verbose outputting of help message.
- hybkit.util.validate_args_exit(args: Namespace, parser: Optional[ArgumentParser] = None) None
Check supplied arguments using
validate_args()
, and exit if a conflict exists.- Parameters
args (argparse.Namespace) -- The arguments produced by argparse.
parser (argparse.ArgumentParser, optional) -- Argparse parser object to use for verbose outputting of help message.
- hybkit.util.set_setting(setting: str, set_value: Any, verbose: bool = False) str
Take a namespace object as from an argparse parser and update settings.
Each setting in the following settings dictionaries are checked and set where applicable:
HybRecord
SettingsHybFile
SettingsFoldRecord
SettingsFoldFile
SettingsHybFoldIter
SettingsAnalysis
Settings
- hybkit.util.set_settings_from_namespace(nspace: Namespace, verbose: bool = False) None
Take a namespace object as from an argparse parser and update settings.
See
set_setting()
for details- Parameters
nspace (argparse.Namespace) -- Namespace containing settings
verbose (
bool
, optional) -- If True, print when changing setting.
hybkit.errors
Module storing hybkit error classes.
- exception hybkit.errors.HybkitError
Base class for Hybkit errors.
- Variables
message (str) -- Human-readable string describing the error.
- exception hybkit.errors.HybkitArgError
Error raised when an invalid argument is provided to a Hybkit function.
Subclass of
HybkitError
.- Variables
message (str) -- Human-readable string describing the error.
- exception hybkit.errors.HybkitConstructorError
Error raised when a read error occurs.
Subclass of
HybkitError
.- Variables
message (str) -- Human-readable string describing the error.
- exception hybkit.errors.HybkitIterError
Error raised when an error is encountered during Hybkit iteration.
Subclass of
HybkitError
.- Variables
message (str) -- Human-readable string describing the error.
- exception hybkit.errors.HybkitMiscError
Error raised when an error is encountered during Hybkit usage.
Subclass of
HybkitError
.- Variables
message (str) -- Human-readable string describing the error.
hybkit Toolkit
The hybkit toolkit contains command-line scripts for analysis and manipulation of hyb and fold files. Scripts are implemented in Python3, and the hybkit module must be on the user's PYTHONPATH for script execution.
The command-line options and flags are generated with the Python3 argparse module. Relevant settings pertaining to specific hybkit classes are accessible via command-line flags, as demonstrated in the "shell_analysis" implementations in the Example Analyses.
This version of hybkit includes the following executables:
Utility
Description
Parse a hyb (/fold) file and check for errors
Evaluate hyb (/fold) records to identify segment types and miRNAs
Filter a hyb (/fold) file to a specific subset of sequences
Perform a type, miRNA, summary, or target analysis on a hyb (/fold) file
Detailed descriptions and usage information are available at each respective script page.
hyb_check
Read one or more hyb (and optional fold) format files and check for errors.
This utility reads in one or more files in hyb-format (see the hybkit Hyb File Specification) and uses hybkit's internal file error checking to check for errors. This can be useful as an initial preparation step for further analyses.
- Example system calls:
hyb_check -i my_file_1.hyb -f my_file_1.vienna hyb_check -i my_file_1.hyb my_file_2.hyb -f my_file_1.vienna \\ my_file_2.vienna -v --custom_flags myflag
usage: hyb_check [-h] -i PATH_TO/MY_FILE.HYB [PATH_TO/MY_FILE.HYB ...]
[-f [PATH_TO/MY_FILE.VIENNA [PATH_TO/MY_FILE.VIENNA ...]]]
[--version] [-v | -s]
[--mirna_types MIRNA_TYPES [MIRNA_TYPES ...]]
[--custom_flags CUSTOM_FLAGS [CUSTOM_FLAGS ...]]
[--hyb_placeholder HYB_PLACEHOLDER]
[--reorder_flags {True,False}]
[--allow_undefined_flags [{True,False}]]
[--allow_unknown_seg_types [{True,False}]]
[--hybformat_id [{True,False}]]
[--hybformat_ref [{True,False}]]
[--allowed_mismatches ALLOWED_MISMATCHES]
[--fold_placeholder FOLD_PLACEHOLDER] [-y {static,dynamic}]
[--error_mode {raise,warn_return,return}]
[--error_checks {hybrecord_indel,foldrecord_nofold,max_mismatch,energy_mismatch}]
[--iter_error_mode {raise,warn_return,warn_skip,skip,return}]
[--max_sequential_skips MAX_SEQUENTIAL_SKIPS]
Named Arguments
- -i, --in_hyb
REQUIRED path to one or more hyb-format files with a ".hyb" suffix for use in the evaluation.
- -f, --in_fold
REQUIRED path to one or more RNA secondary-structure files with a ".vienna" or ".ct" suffix for use in the evaluation.
- --version
Print version and exit.
- -v, --verbose
Print verbose output during run.
Default: False
- -s, --silent
Print no output during run.
Default: False
Hyb Record Settings
- --mirna_types
"seg_type" fields identifying a miRNA
Default: ['miRNA', 'microRNA']
- --custom_flags
Custom flags to allow in addition to those specified in the hybkit specification.
Default: []
- --hyb_placeholder
placeholder character/string for missing data in hyb files.
Default: "."
- --reorder_flags
Possible choices: True, False
Re-order flags to the hybkit-specification order when writing hyb records.
Default: True
- --allow_undefined_flags
Possible choices: True, False
Allow use of flags not defined in the hybkit-specification order when reading and writing hyb records. As the preferred alternative to using this setting, the --custom_flags argument can be be used to supply custom allowed flags.
Default: False
- --allow_unknown_seg_types
Possible choices: True, False
Allow unknown segment types when assigning segment types.
Default: False
Hyb File Settings
- --hybformat_id
Possible choices: True, False
The Hyb Software Package places further information in the "id" field of the hybrid record that can be used to infer the number of contained read counts. When set to True, the identifiers will be parsed as: "<read_id>_<read_count>"
Default: False
- --hybformat_ref
Possible choices: True, False
The Hyb Software Package uses a reference database with identifiers that contain sequence type and other sequence information. When set to True, all hyb file identifiers will be parsed as: "<gene_id>_<transcript_id>_<gene_name>_<seg_type>"
Default: False
Fold Record Settings
- --allowed_mismatches
For DynamicFoldRecords, allowed number of mismatches with a HybRecord.
Default: 0
- --fold_placeholder
Placeholder character/string for missing data for reading/writing fold records.
Default: "."
- -y, --seq_type
Possible choices: static, dynamic
Type of fold record object to use. Options: "static": FoldRecord, requires an exact sequence match to be paired with a HybRecord; "dynamic": DynamicFoldRecord, requires a sequence match to the "dynamic" annotated regions of a HybRecord, and may be shorter/longer than the original sequence.
Default: "static"
- --error_mode
Possible choices: raise, warn_return, return
Mode for handling errors during reading of HybFiles (overridden by HybFoldIter.settings['iter_error_mode'] when using HybFoldIter). Options: "raise": Raise an error when encountered and exit program ; "warn_return": Print a warning and return the error_value ; "return": Return the error value with no program output. record is encountered.
Default: "raise"
Hyb-Fold Iterator Settings
- --error_checks
Possible choices: hybrecord_indel, foldrecord_nofold, max_mismatch, energy_mismatch
Error checks for simultaneous HybFile and FoldFile parsing. Options: "hybrecord_indel": Error for HybRecord objects where one/both sequences have insertions/deletions in alignment, which prevents matching of sequences; "foldrecord_nofold": Error when failure in reading a fold_record object; "max_mismatch": Error when mismatch between hybrecord and foldrecord sequences is greater than FoldRecord "allowed_mismatches" setting; "energy_mismatch": Error when a mismatch exists between HybRecord and FoldRecord energy values.
Default: ['hybrecord_indel', 'foldrecord_nofold', 'max_mismatch', 'energy_mismatch']
- --iter_error_mode
Possible choices: raise, warn_return, warn_skip, skip, return
Mode for handling errors found during error checks. Overrides HybRecord "error_mode" setting when using HybFoldIter. Options: "raise": Raise an error when encountered; "warn_return": Print a warning and return the value; "warn_skip": Print a warning and continue to the next iteration; "skip": Continue to the next iteration without any output; "return": return the value without any error output;
Default: "warn_skip"
- --max_sequential_skips
Maximum number of record(-pairs) to skip in a row. Limited as several sequential skips usually indicates an issue with record formatting or a desynchronization between files.
Default: 100
- Output File Naming:
Output files can be named in two fashions: via automatic name generation, or by providing specific out file names.
- Automatic Name Generation:
For output name generation, the default respective naming scheme is used:
hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --> OUT_DIR/MY_FILE_1_ADDSUFFIX.HYB
This output file path can be modified with the arguments {--out_dir, --out_suffix} described below.
The output directory defaults to the current working directory
($PWD)
, and can be modified with the--out_dir <dir>
argument. Note: The provided directory must exist, or an error will be raised. For Example:hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_dir MY_OUT_DIR --> MY_OUT_DIR/MY_FILE_1_ADDSUFFIX.HYB
The suffix used for output files is based on the primary actions of the script. It can be specified using
--out_suffix <suffix>
. This can optionally include the ".hyb" final suffix. for Example:hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_suffix MY_SUFFIX --> OUT_DIR/MY_FILE_1_MY_SUFFIX.HYB #OR hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_suffix MY_SUFFIX.HYB --> OUT_DIR/MY_FILE_1_MY_SUFFIX.HYB
- Specific Output Names:
Alternatively, specific file names can be provided via the -o/--out_hyb argument, ensuring that the same number of input and output files are provided. This argument takes precedence over all automatic output file naming options (--out_dir, --out_suffix), which are ignored if -o/--out_hyb is provided. For Example:
hyb_script [...] --out_hyb MY_OUT_DIR/OUT_FILE_1.HYB MY_OUT_DIR/OUT_FILE_2.HYB --> MY_OUT_DIR/OUT_FILE_1.hyb --> MY_OUT_DIR/OUT_FILE_2.hyb
Note: The directory provided with output file paths (MY_OUT_DIR above) must exist, otherwise an error will be raised.
hyb_filter
Filter hyb (and corresponding fold) files to meet (or exclude) specific criteria.
This script takes one or more filter and/or exclusion criteria and outputs only those records matching (/excluding) those criteria.
The filter criteria and options are based on the options provided by the
hybkit.HybRecord.prop()
method of the Hybkit API. For more information see
the full documentation for the HybRecord
class.
- Example System Calls:
hyb_filter -i my_file_1.hyb --filter has_seg_types # Outputs records that have completed a segtype analysis hyb_filter -i my_file_1.hyb -f my_file_1.vienna \\ --include seg_type mRNA # Outputs hyb and fold records where hyb record has either segtype of mRNA hyb_filter -i my_file_1.hyb --exclude seg_type mRNA # Outputs records without either segtype of mRNA hyb_filter -i my_file_1.hyb --include seg1_type mRNA # Outputs records with only the first / 5p segtype of mRNA hyb_filter -i my_file_1.hyb my_file_2.hyb -f my_file_1.vienna my_file_2.vienna \\ --include seg_type_contains RNA # Outputs all records with a segtype that includes # the string "RNA" (case-sensitive) hyb_filter -i my_file_1.hyb --filter seg_contains kshv # Outputs records where either segment identifier contains the # the string: "kshv" (case-sensitive)
Multiple filtering options can be used together.
The -m
/ --filter_mode
argument determines whether
"any" (DEFAULT) or "all" filters are required to be true for inclusion.
Note: Matching any exclusion criteria results in exclusion of the record.
- Example System Calls (match ALL criteria):
hyb_filter -i my_file_1.hyb -f my_file_1.vienna \\ --filter seg_contains kshv \\ --filter_2 seg_type miRNA # Outputs records with either reference sequence identifier containing "kshv" # and with either segment having an assigned segtype of miRNA
- Example System Calls (match ANY criteria):
hyb_filter -i my_file_1.hyb --filter_mode any \\ --filter seg_type miRNA \\ --filter_2 seg_type lncRNA # Outputs records containing either segment type matching # either "miRNA" or "lncRNA" (case-sensitive)
usage: hyb_filter [-h] -i PATH_TO/MY_FILE.HYB [PATH_TO/MY_FILE.HYB ...]
[-f [PATH_TO/MY_FILE.VIENNA [PATH_TO/MY_FILE.VIENNA ...]]]
[-o PATH_TO/OUT_FILE.HYB [PATH_TO/OUT_FILE.HYB ...]]
[-l PATH_TO/OUT_FILE.VIENNA [PATH_TO/OUT_FILE.VIENNA ...]]
[-d OUT_DIR] [-u OUT_SUFFIX] [-m {all,any}]
[--skip_dup_id_before] [--skip_dup_id_after]
[--filter FILTER [FILTER ...]]
[--filter_2 FILTER_2 [FILTER_2 ...]]
[--filter_3 FILTER_3 [FILTER_3 ...]]
[--exclude EXCLUDE [EXCLUDE ...]]
[--exclude_2 EXCLUDE_2 [EXCLUDE_2 ...]]
[--exclude_3 EXCLUDE_3 [EXCLUDE_3 ...]] [--set_dataset]
[--version] [-v | -s]
[--mirna_types MIRNA_TYPES [MIRNA_TYPES ...]]
[--custom_flags CUSTOM_FLAGS [CUSTOM_FLAGS ...]]
[--hyb_placeholder HYB_PLACEHOLDER]
[--reorder_flags {True,False}]
[--allow_undefined_flags [{True,False}]]
[--allow_unknown_seg_types [{True,False}]]
[--hybformat_id [{True,False}]]
[--hybformat_ref [{True,False}]]
[--allowed_mismatches ALLOWED_MISMATCHES]
[--fold_placeholder FOLD_PLACEHOLDER] [-y {static,dynamic}]
[--error_mode {raise,warn_return,return}]
[--error_checks {hybrecord_indel,foldrecord_nofold,max_mismatch,energy_mismatch}]
[--iter_error_mode {raise,warn_return,warn_skip,skip,return}]
[--max_sequential_skips MAX_SEQUENTIAL_SKIPS]
Named Arguments
- -i, --in_hyb
REQUIRED path to one or more hyb-format files with a ".hyb" suffix for use in the evaluation.
- -f, --in_fold
REQUIRED path to one or more RNA secondary-structure files with a ".vienna" or ".ct" suffix for use in the evaluation.
- -o, --out_hyb
Optional path to one or more hyb-format file for output (should include a ".hyb" suffix). If not provided, the output for input file "PATH_TO/MY_FILE.HYB" will be used as a template for the output "OUT_DIR/MY_FILE_OUT.HYB".
- -l, --out_fold
Optional path to one or more ".vienna" or ".ct"-format files for output (should include appropriate ".vienna"/".ct" suffix). If not provided, the output for input file "PATH_TO/MY_FILE.VIENNA" will be used as a template for the output "OUT_DIR/MY_FILE_OUT.VIENNA".
- -d, --out_dir
Path to directory for output of files. Defaults to the current working directory.
Default: $PWD
- -u, --out_suffix
Suffix to add to the name of output files, before any file- or analysis-specific suffixes. The file-type appropriate suffix will be added automatically.
Default: "_filtered"
- -m, --filter_mode
Possible choices: all, any
Modes for evaluating multiple filters. The "all" mode requires all provided filters to be true for inclusion. The "any" mode requires only one provided filter to be true for inclusion. (Note: matching any exclusion filter is grounds for exclusion of record.)
Default: "all"
- --skip_dup_id_before
Skip sequential duplicate read IDs before filtering.
Default: False
- --skip_dup_id_after
Skip sequential duplicate read IDs after filtering.
Default: False
- --filter
Filter criteria #1. Records matching the criteria will be included in output. Includes a filter type, Ex: "seg_name_contains", and an argument, Ex: "ENST00000340384". (Note: not all filter types require a second argument, for Example: "has_mirna_seg")
- --filter_2
Filter criteria #2. Records matching the criteria will be included in output. Includes a filter type, Ex: "seg_name_contains", and an argument, Ex: "ENST00000340384". (Note: not all filter types require a second argument, for Example: "has_mirna_seg")
- --filter_3
Filter criteria #3. Records matching the criteria will be included in output. Includes a filter type, Ex: "seg_name_contains", and an argument, Ex: "ENST00000340384". (Note: not all filter types require a second argument, for Example: "has_mirna_seg")
- --exclude
Exclusion filter criteria #1. Records matching the criteria will be excluded from output. Includes a filter type, Ex: "seg_name_contains", and an argument, Ex: "ENST00000340384". (Note: not all filter types require a second argument, for Example: "has_mirna_seg")
- --exclude_2
Exclusion filter criteria #2. Records matching the criteria will be excluded from output. Includes a filter type, Ex: "seg_name_contains", and an argument, Ex: "ENST00000340384". (Note: not all filter types require a second argument, for Example: "has_mirna_seg")
- --exclude_3
Exclusion filter criteria #3. Records matching the criteria will be excluded from output. Includes a filter type, Ex: "seg_name_contains", and an argument, Ex: "ENST00000340384". (Note: not all filter types require a second argument, for Example: "has_mirna_seg")
- --set_dataset
Set "dataset" flag to value of the input file name.
Default: False
- --version
Print version and exit.
- -v, --verbose
Print verbose output during run.
Default: False
- -s, --silent
Print no output during run.
Default: False
Hyb Record Settings
- --mirna_types
"seg_type" fields identifying a miRNA
Default: ['miRNA', 'microRNA']
- --custom_flags
Custom flags to allow in addition to those specified in the hybkit specification.
Default: []
- --hyb_placeholder
placeholder character/string for missing data in hyb files.
Default: "."
- --reorder_flags
Possible choices: True, False
Re-order flags to the hybkit-specification order when writing hyb records.
Default: True
- --allow_undefined_flags
Possible choices: True, False
Allow use of flags not defined in the hybkit-specification order when reading and writing hyb records. As the preferred alternative to using this setting, the --custom_flags argument can be be used to supply custom allowed flags.
Default: False
- --allow_unknown_seg_types
Possible choices: True, False
Allow unknown segment types when assigning segment types.
Default: False
Hyb File Settings
- --hybformat_id
Possible choices: True, False
The Hyb Software Package places further information in the "id" field of the hybrid record that can be used to infer the number of contained read counts. When set to True, the identifiers will be parsed as: "<read_id>_<read_count>"
Default: False
- --hybformat_ref
Possible choices: True, False
The Hyb Software Package uses a reference database with identifiers that contain sequence type and other sequence information. When set to True, all hyb file identifiers will be parsed as: "<gene_id>_<transcript_id>_<gene_name>_<seg_type>"
Default: False
Fold Record Settings
- --allowed_mismatches
For DynamicFoldRecords, allowed number of mismatches with a HybRecord.
Default: 0
- --fold_placeholder
Placeholder character/string for missing data for reading/writing fold records.
Default: "."
- -y, --seq_type
Possible choices: static, dynamic
Type of fold record object to use. Options: "static": FoldRecord, requires an exact sequence match to be paired with a HybRecord; "dynamic": DynamicFoldRecord, requires a sequence match to the "dynamic" annotated regions of a HybRecord, and may be shorter/longer than the original sequence.
Default: "static"
- --error_mode
Possible choices: raise, warn_return, return
Mode for handling errors during reading of HybFiles (overridden by HybFoldIter.settings['iter_error_mode'] when using HybFoldIter). Options: "raise": Raise an error when encountered and exit program ; "warn_return": Print a warning and return the error_value ; "return": Return the error value with no program output. record is encountered.
Default: "raise"
Hyb-Fold Iterator Settings
- --error_checks
Possible choices: hybrecord_indel, foldrecord_nofold, max_mismatch, energy_mismatch
Error checks for simultaneous HybFile and FoldFile parsing. Options: "hybrecord_indel": Error for HybRecord objects where one/both sequences have insertions/deletions in alignment, which prevents matching of sequences; "foldrecord_nofold": Error when failure in reading a fold_record object; "max_mismatch": Error when mismatch between hybrecord and foldrecord sequences is greater than FoldRecord "allowed_mismatches" setting; "energy_mismatch": Error when a mismatch exists between HybRecord and FoldRecord energy values.
Default: ['hybrecord_indel', 'foldrecord_nofold', 'max_mismatch', 'energy_mismatch']
- --iter_error_mode
Possible choices: raise, warn_return, warn_skip, skip, return
Mode for handling errors found during error checks. Overrides HybRecord "error_mode" setting when using HybFoldIter. Options: "raise": Raise an error when encountered; "warn_return": Print a warning and return the value; "warn_skip": Print a warning and continue to the next iteration; "skip": Continue to the next iteration without any output; "return": return the value without any error output;
Default: "warn_skip"
- --max_sequential_skips
Maximum number of record(-pairs) to skip in a row. Limited as several sequential skips usually indicates an issue with record formatting or a desynchronization between files.
Default: 100
- Output File Naming:
Output files can be named in two fashions: via automatic name generation, or by providing specific out file names.
- Automatic Name Generation:
For output name generation, the default respective naming scheme is used:
hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --> OUT_DIR/MY_FILE_1_ADDSUFFIX.HYB
This output file path can be modified with the arguments {--out_dir, --out_suffix} described below.
The output directory defaults to the current working directory
($PWD)
, and can be modified with the--out_dir <dir>
argument. Note: The provided directory must exist, or an error will be raised. For Example:hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_dir MY_OUT_DIR --> MY_OUT_DIR/MY_FILE_1_ADDSUFFIX.HYB
The suffix used for output files is based on the primary actions of the script. It can be specified using
--out_suffix <suffix>
. This can optionally include the ".hyb" final suffix. for Example:hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_suffix MY_SUFFIX --> OUT_DIR/MY_FILE_1_MY_SUFFIX.HYB #OR hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_suffix MY_SUFFIX.HYB --> OUT_DIR/MY_FILE_1_MY_SUFFIX.HYB
- Specific Output Names:
Alternatively, specific file names can be provided via the -o/--out_hyb argument, ensuring that the same number of input and output files are provided. This argument takes precedence over all automatic output file naming options (--out_dir, --out_suffix), which are ignored if -o/--out_hyb is provided. For Example:
hyb_script [...] --out_hyb MY_OUT_DIR/OUT_FILE_1.HYB MY_OUT_DIR/OUT_FILE_2.HYB --> MY_OUT_DIR/OUT_FILE_1.hyb --> MY_OUT_DIR/OUT_FILE_2.hyb
Note: The directory provided with output file paths (MY_OUT_DIR above) must exist, otherwise an error will be raised.
hyb_eval
Read hyb files (and optional matched fold files) and evaluate the contained hybrids.
This utility reads in one or more files in hyb-format (see the hybkit Hyb File Specification) and corresponding fold files (.vienna or .ct) and evaluates hybrid record properties.
Evaluation Types:
type
Assigns types to each segment within hyb records
mirna
Assigns which segments are a miRNA based on segment types.
type
Evaluation:- The 'type' evaluation utilizes the
hybkit.HybRecord.eval_types()
method to assign the record flags: seg1_type and seg2_type- Example system calls:
$ hyb_eval -t type -i my_file_1.hyb $ hyb_eval -t type -i my_file_1.hyb -f my_file_1.vienna $ hyb_eval -t type \\ -i my_file_1.hyb my_file_2.hyb \\ -f my_file_1.vienna my_file_2.vienna \\ --type_method string_match \\ --type_parameters my_parameters_file.csv \\ --allow_unknown_seg_types
mirna
Evaluation:- The 'mirna' evaluation uses the
hybkit.HybRecord.eval_mirna()
method to identify properties relating to mirna within the hybrids, including mirna presence and positions. This evaluation requires the seg_type flags to be filled, either by a type evaluation, or by parsing the read using the--hybformat_ref True
option with a hyb-format reference. The mirna_seg flag is then set for each record, indicating the presence and position of any miRNA within the hybrid.- Example system calls:
$ hyb_eval -t mirna -i my_file_1.hyb $ hyb_eval -t mirna -i my_file_1.hyb -f my_file_1.vienna $ hyb_eval -t mirna -i my_file_1.hyb my_vile_2.hyb \\ -f my_file_1.vienna my_file_2.vienna \\ --mirna_types miRNA kshv-miRNA
- This can also be combined with the type evaluation, as such:
$ hyb_eval -t type mirna -i my_file_1.hyb \\ --type_method string_match \\ --type_parameters my_parameters_file.csv \\ --allow_unknown_seg_types \\ --mirna_types miRNA kshv-miRNA
usage: hyb_analysis [-h] -i PATH_TO/MY_FILE.HYB [PATH_TO/MY_FILE.HYB ...]
[-f [PATH_TO/MY_FILE.VIENNA [PATH_TO/MY_FILE.VIENNA ...]]]
[-o PATH_TO/OUT_FILE.HYB [PATH_TO/OUT_FILE.HYB ...]]
[-l PATH_TO/OUT_FILE.VIENNA [PATH_TO/OUT_FILE.VIENNA ...]]
[-d OUT_DIR] [-u OUT_SUFFIX]
[-t {type,mirna} [{type,mirna} ...]]
[--type_method {hybformat,string_match,id_map}]
[--type_params_file PATH_TO/PARAMETERS_FILE]
[--set_dataset] [--version] [-v | -s]
[--mirna_types MIRNA_TYPES [MIRNA_TYPES ...]]
[--custom_flags CUSTOM_FLAGS [CUSTOM_FLAGS ...]]
[--hyb_placeholder HYB_PLACEHOLDER]
[--reorder_flags {True,False}]
[--allow_undefined_flags [{True,False}]]
[--allow_unknown_seg_types [{True,False}]]
[--hybformat_id [{True,False}]]
[--hybformat_ref [{True,False}]]
[--allowed_mismatches ALLOWED_MISMATCHES]
[--fold_placeholder FOLD_PLACEHOLDER]
[-y {static,dynamic}]
[--error_mode {raise,warn_return,return}]
[--error_checks {hybrecord_indel,foldrecord_nofold,max_mismatch,energy_mismatch}]
[--iter_error_mode {raise,warn_return,warn_skip,skip,return}]
[--max_sequential_skips MAX_SEQUENTIAL_SKIPS]
Named Arguments
- -i, --in_hyb
REQUIRED path to one or more hyb-format files with a ".hyb" suffix for use in the evaluation.
- -f, --in_fold
REQUIRED path to one or more RNA secondary-structure files with a ".vienna" or ".ct" suffix for use in the evaluation.
- -o, --out_hyb
Optional path to one or more hyb-format file for output (should include a ".hyb" suffix). If not provided, the output for input file "PATH_TO/MY_FILE.HYB" will be used as a template for the output "OUT_DIR/MY_FILE_OUT.HYB".
- -l, --out_fold
Optional path to one or more ".vienna" or ".ct"-format files for output (should include appropriate ".vienna"/".ct" suffix). If not provided, the output for input file "PATH_TO/MY_FILE.VIENNA" will be used as a template for the output "OUT_DIR/MY_FILE_OUT.VIENNA".
- -d, --out_dir
Path to directory for output of files. Defaults to the current working directory.
Default: $PWD
- -u, --out_suffix
Suffix to add to the name of output files, before any file- or analysis-specific suffixes. The file-type appropriate suffix will be added automatically.
Default: "_evaluated"
- -t, --eval_types
Possible choices: type, mirna
Types of evaluations to perform on input hyb file. (Note: evaluations can be combined, such as "--eval_types type mirna")
Default: ['type']
- --set_dataset
Set "dataset" flag to value of the input file name.
Default: False
- --version
Print version and exit.
- -v, --verbose
Print verbose output during run.
Default: False
- -s, --silent
Print no output during run.
Default: False
type Analysis Options
- --type_method
Possible choices: hybformat, string_match, id_map
Segment-type finding method to use for type evaluation. For a description of the different methods, see the HybRecord documentation for the eval_types method.
Default: "hybformat"
- --type_params_file
Segment-type finding parameters file to use for type evaluation with some type finding methods: {string_match, id_map}. For a description of the different methods, see the HybRecord documentation for the find_seg_types method.
Hyb Record Settings
- --mirna_types
"seg_type" fields identifying a miRNA
Default: ['miRNA', 'microRNA']
- --custom_flags
Custom flags to allow in addition to those specified in the hybkit specification.
Default: []
- --hyb_placeholder
placeholder character/string for missing data in hyb files.
Default: "."
- --reorder_flags
Possible choices: True, False
Re-order flags to the hybkit-specification order when writing hyb records.
Default: True
- --allow_undefined_flags
Possible choices: True, False
Allow use of flags not defined in the hybkit-specification order when reading and writing hyb records. As the preferred alternative to using this setting, the --custom_flags argument can be be used to supply custom allowed flags.
Default: False
- --allow_unknown_seg_types
Possible choices: True, False
Allow unknown segment types when assigning segment types.
Default: False
Hyb File Settings
- --hybformat_id
Possible choices: True, False
The Hyb Software Package places further information in the "id" field of the hybrid record that can be used to infer the number of contained read counts. When set to True, the identifiers will be parsed as: "<read_id>_<read_count>"
Default: False
- --hybformat_ref
Possible choices: True, False
The Hyb Software Package uses a reference database with identifiers that contain sequence type and other sequence information. When set to True, all hyb file identifiers will be parsed as: "<gene_id>_<transcript_id>_<gene_name>_<seg_type>"
Default: False
Fold Record Settings
- --allowed_mismatches
For DynamicFoldRecords, allowed number of mismatches with a HybRecord.
Default: 0
- --fold_placeholder
Placeholder character/string for missing data for reading/writing fold records.
Default: "."
- -y, --seq_type
Possible choices: static, dynamic
Type of fold record object to use. Options: "static": FoldRecord, requires an exact sequence match to be paired with a HybRecord; "dynamic": DynamicFoldRecord, requires a sequence match to the "dynamic" annotated regions of a HybRecord, and may be shorter/longer than the original sequence.
Default: "static"
- --error_mode
Possible choices: raise, warn_return, return
Mode for handling errors during reading of HybFiles (overridden by HybFoldIter.settings['iter_error_mode'] when using HybFoldIter). Options: "raise": Raise an error when encountered and exit program ; "warn_return": Print a warning and return the error_value ; "return": Return the error value with no program output. record is encountered.
Default: "raise"
Hyb-Fold Iterator Settings
- --error_checks
Possible choices: hybrecord_indel, foldrecord_nofold, max_mismatch, energy_mismatch
Error checks for simultaneous HybFile and FoldFile parsing. Options: "hybrecord_indel": Error for HybRecord objects where one/both sequences have insertions/deletions in alignment, which prevents matching of sequences; "foldrecord_nofold": Error when failure in reading a fold_record object; "max_mismatch": Error when mismatch between hybrecord and foldrecord sequences is greater than FoldRecord "allowed_mismatches" setting; "energy_mismatch": Error when a mismatch exists between HybRecord and FoldRecord energy values.
Default: ['hybrecord_indel', 'foldrecord_nofold', 'max_mismatch', 'energy_mismatch']
- --iter_error_mode
Possible choices: raise, warn_return, warn_skip, skip, return
Mode for handling errors found during error checks. Overrides HybRecord "error_mode" setting when using HybFoldIter. Options: "raise": Raise an error when encountered; "warn_return": Print a warning and return the value; "warn_skip": Print a warning and continue to the next iteration; "skip": Continue to the next iteration without any output; "return": return the value without any error output;
Default: "warn_skip"
- --max_sequential_skips
Maximum number of record(-pairs) to skip in a row. Limited as several sequential skips usually indicates an issue with record formatting or a desynchronization between files.
Default: 100
- Output File Naming:
Output files can be named in two fashions: via automatic name generation, or by providing specific out file names.
- Automatic Name Generation:
For output name generation, the default respective naming scheme is used:
hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --> OUT_DIR/MY_FILE_1_ADDSUFFIX.HYB
This output file path can be modified with the arguments {--out_dir, --out_suffix} described below.
The output directory defaults to the current working directory
($PWD)
, and can be modified with the--out_dir <dir>
argument. Note: The provided directory must exist, or an error will be raised. For Example:hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_dir MY_OUT_DIR --> MY_OUT_DIR/MY_FILE_1_ADDSUFFIX.HYB
The suffix used for output files is based on the primary actions of the script. It can be specified using
--out_suffix <suffix>
. This can optionally include the ".hyb" final suffix. for Example:hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_suffix MY_SUFFIX --> OUT_DIR/MY_FILE_1_MY_SUFFIX.HYB #OR hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_suffix MY_SUFFIX.HYB --> OUT_DIR/MY_FILE_1_MY_SUFFIX.HYB
- Specific Output Names:
Alternatively, specific file names can be provided via the -o/--out_hyb argument, ensuring that the same number of input and output files are provided. This argument takes precedence over all automatic output file naming options (--out_dir, --out_suffix), which are ignored if -o/--out_hyb is provided. For Example:
hyb_script [...] --out_hyb MY_OUT_DIR/OUT_FILE_1.HYB MY_OUT_DIR/OUT_FILE_2.HYB --> MY_OUT_DIR/OUT_FILE_1.hyb --> MY_OUT_DIR/OUT_FILE_2.hyb
Note: The directory provided with output file paths (MY_OUT_DIR above) must exist, otherwise an error will be raised.
hyb_analyze
Read hyb / vienna files and analyze the fold information in the contained hybrid sequences.
Analysis Types:
This utility reads in one or more files in hyb-format (see the hybkit Hyb File Specification) along with a corresponding predicted RNA secondary structure file in the "Vienna" (Vienna Format) or "CT" (CT_Format) formats, and analyzes hybrid secondary structure properties.
For full information on the different analysis types, see the Analyses section of the hybkit documentation.
- Example system calls:
$ hyb_analyze -a fold -i my_file_1.hyb -f my_file_1.vienna $ hyb_analyze -a fold -i my_file_2.hyb -f my_file_2.ct \\ --make_plots False
usage: hyb_analysis [-h] -i PATH_TO/MY_FILE.HYB [PATH_TO/MY_FILE.HYB ...]
[-f [PATH_TO/MY_FILE.VIENNA [PATH_TO/MY_FILE.VIENNA ...]]]
[-o PATH_TO/OUT_BASENAME [PATH_TO/OUT_BASENAME ...]]
[-d OUT_DIR] [-u OUT_SUFFIX]
[-a {energy,type,mirna,target,fold} [{energy,type,mirna,target,fold} ...]]
[-n ANALYSIS_NAME] [-p {True,False}] [--version] [-v | -s]
[--mirna_types MIRNA_TYPES [MIRNA_TYPES ...]]
[--custom_flags CUSTOM_FLAGS [CUSTOM_FLAGS ...]]
[--hyb_placeholder HYB_PLACEHOLDER]
[--reorder_flags {True,False}]
[--allow_undefined_flags [{True,False}]]
[--allow_unknown_seg_types [{True,False}]]
[--hybformat_id [{True,False}]]
[--hybformat_ref [{True,False}]]
[--allowed_mismatches ALLOWED_MISMATCHES]
[--fold_placeholder FOLD_PLACEHOLDER]
[-y {static,dynamic}]
[--error_mode {raise,warn_return,return}]
[--error_checks {hybrecord_indel,foldrecord_nofold,max_mismatch,energy_mismatch}]
[--iter_error_mode {raise,warn_return,warn_skip,skip,return}]
[--max_sequential_skips MAX_SEQUENTIAL_SKIPS]
[--quant_mode {single,reads,records}]
[--out_delim OUT_DELIM]
Named Arguments
- -i, --in_hyb
REQUIRED path to one or more hyb-format files with a ".hyb" suffix for use in the evaluation.
- -f, --in_fold
REQUIRED path to one or more RNA secondary-structure files with a ".vienna" or ".ct" suffix for use in the evaluation.
- -o, --out_basename
Optional path to one or more basename prefixes to use for output. The appropriate suffix will be added based on the specific name. If not provided, the output for input file "PATH_TO/MY_FILE.HYB" will be used as a template for the basename "OUT_DIR/MY_FILE".
- -d, --out_dir
Path to directory for output of files. Defaults to the current working directory.
Default: $PWD
- -u, --out_suffix
Suffix to add to the name of output files, before any file- or analysis-specific suffixes. The file-type appropriate suffix will be added automatically.
- -a, --analysis_types
Possible choices: energy, type, mirna, target, fold
Analysis to perform on input hyb and fold files.
Default: "fold"
- -n, --analysis_name
Name / title of analysis data.
- -p, --make_plots
Possible choices: True, False
Create plots of analysis output.
Default: True
- --version
Print version and exit.
- -v, --verbose
Print verbose output during run.
Default: False
- -s, --silent
Print no output during run.
Default: False
Hyb Record Settings
- --mirna_types
"seg_type" fields identifying a miRNA
Default: ['miRNA', 'microRNA']
- --custom_flags
Custom flags to allow in addition to those specified in the hybkit specification.
Default: []
- --hyb_placeholder
placeholder character/string for missing data in hyb files.
Default: "."
- --reorder_flags
Possible choices: True, False
Re-order flags to the hybkit-specification order when writing hyb records.
Default: True
- --allow_undefined_flags
Possible choices: True, False
Allow use of flags not defined in the hybkit-specification order when reading and writing hyb records. As the preferred alternative to using this setting, the --custom_flags argument can be be used to supply custom allowed flags.
Default: False
- --allow_unknown_seg_types
Possible choices: True, False
Allow unknown segment types when assigning segment types.
Default: False
Hyb File Settings
- --hybformat_id
Possible choices: True, False
The Hyb Software Package places further information in the "id" field of the hybrid record that can be used to infer the number of contained read counts. When set to True, the identifiers will be parsed as: "<read_id>_<read_count>"
Default: False
- --hybformat_ref
Possible choices: True, False
The Hyb Software Package uses a reference database with identifiers that contain sequence type and other sequence information. When set to True, all hyb file identifiers will be parsed as: "<gene_id>_<transcript_id>_<gene_name>_<seg_type>"
Default: False
Fold Record Settings
- --allowed_mismatches
For DynamicFoldRecords, allowed number of mismatches with a HybRecord.
Default: 0
- --fold_placeholder
Placeholder character/string for missing data for reading/writing fold records.
Default: "."
- -y, --seq_type
Possible choices: static, dynamic
Type of fold record object to use. Options: "static": FoldRecord, requires an exact sequence match to be paired with a HybRecord; "dynamic": DynamicFoldRecord, requires a sequence match to the "dynamic" annotated regions of a HybRecord, and may be shorter/longer than the original sequence.
Default: "static"
- --error_mode
Possible choices: raise, warn_return, return
Mode for handling errors during reading of HybFiles (overridden by HybFoldIter.settings['iter_error_mode'] when using HybFoldIter). Options: "raise": Raise an error when encountered and exit program ; "warn_return": Print a warning and return the error_value ; "return": Return the error value with no program output. record is encountered.
Default: "raise"
Hyb-Fold Iterator Settings
- --error_checks
Possible choices: hybrecord_indel, foldrecord_nofold, max_mismatch, energy_mismatch
Error checks for simultaneous HybFile and FoldFile parsing. Options: "hybrecord_indel": Error for HybRecord objects where one/both sequences have insertions/deletions in alignment, which prevents matching of sequences; "foldrecord_nofold": Error when failure in reading a fold_record object; "max_mismatch": Error when mismatch between hybrecord and foldrecord sequences is greater than FoldRecord "allowed_mismatches" setting; "energy_mismatch": Error when a mismatch exists between HybRecord and FoldRecord energy values.
Default: ['hybrecord_indel', 'foldrecord_nofold', 'max_mismatch', 'energy_mismatch']
- --iter_error_mode
Possible choices: raise, warn_return, warn_skip, skip, return
Mode for handling errors found during error checks. Overrides HybRecord "error_mode" setting when using HybFoldIter. Options: "raise": Raise an error when encountered; "warn_return": Print a warning and return the value; "warn_skip": Print a warning and continue to the next iteration; "skip": Continue to the next iteration without any output; "return": return the value without any error output;
Default: "warn_skip"
- --max_sequential_skips
Maximum number of record(-pairs) to skip in a row. Limited as several sequential skips usually indicates an issue with record formatting or a desynchronization between files.
Default: 100
Analysis Settings
- --quant_mode
Possible choices: single, reads, records
Method for counting records. Options: "single": Count each record as a single entry; "reads": Use the number of reads per hyb record as the count (may contain PCR duplicates); "records": Count the number of records represented by each hyb record entry (1 for "unmerged" records, >= 1 for "merged" records)
Default: "single"
- --out_delim
Delimiter-string to place between fields in analysis output.
Default: ","
- Output File Naming:
Output files can be named in two fashions: via automatic name generation, or by providing specific out file names.
- Automatic Name Generation:
For output name generation, the default respective naming scheme is used:
hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --> OUT_DIR/MY_FILE_1_ADDSUFFIX.HYB
This output file path can be modified with the arguments {--out_dir, --out_suffix} described below.
The output directory defaults to the current working directory
($PWD)
, and can be modified with the--out_dir <dir>
argument. Note: The provided directory must exist, or an error will be raised. For Example:hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_dir MY_OUT_DIR --> MY_OUT_DIR/MY_FILE_1_ADDSUFFIX.HYB
The suffix used for output files is based on the primary actions of the script. It can be specified using
--out_suffix <suffix>
. This can optionally include the ".hyb" final suffix. for Example:hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_suffix MY_SUFFIX --> OUT_DIR/MY_FILE_1_MY_SUFFIX.HYB #OR hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_suffix MY_SUFFIX.HYB --> OUT_DIR/MY_FILE_1_MY_SUFFIX.HYB
- Specific Output Names:
Alternatively, specific file names can be provided via the -o/--out_hyb argument, ensuring that the same number of input and output files are provided. This argument takes precedence over all automatic output file naming options (--out_dir, --out_suffix), which are ignored if -o/--out_hyb is provided. For Example:
hyb_script [...] --out_hyb MY_OUT_DIR/OUT_FILE_1.HYB MY_OUT_DIR/OUT_FILE_2.HYB --> MY_OUT_DIR/OUT_FILE_1.hyb --> MY_OUT_DIR/OUT_FILE_2.hyb
Note: The directory provided with output file paths (MY_OUT_DIR above) must exist, otherwise an error will be raised.
Example Analyses
This section includes multiple example stepwise analyses of data from a qCLASH experiment described in [Gay2018], with data acquired from the NCBI Gene Expression Omnibus (GEO) accession GSE101978.
Each analysis is implemented both using the Python3 API, and as a sequence of shell executable commands in a bash script. The Python API implementations are generally significantly more efficient as more steps can be performed on a single iteration over the input data.
Each analysis performs quality control steps on the data by checking data integrity (hyb_check) and removing artifactual ribosomal- and mitochondrial-RNA hybrids (hyb_filter). Further filtration may be performed, and then each described analysis is carried out.
Pipeline
Description
Quantify the sequence and miRNA types in a hyb file
Analyze targets of a set of miRNAs from a single experiment
Analyze and plot targets of a set of miRNAs from pooled experimental replicates
Analyze and plot predicted miRNA folding patterns in miRNA-containing hybrids
Further details on each respective example analysis can be found in each section.
Example Type-miRNA Analysis
This directory contains a example analysis of Hyb-format data, published in the quick Crosslinking and Sequencing of Hybrids (qCLASH) experiment described in: Gay, Lauren A., et al. "Modified cross-linking, ligation, and sequencing of hybrids (qCLASH) identifies Kaposi's Sarcoma-associated herpesvirus microRNA targets in endothelial cells." Journal of virology 92.8 (2018): e02138-17.
- The analysis is carried out in multiple example implementations which produce identical output:
via the Command-Line
via the Python3 API
This analysis first performs quality control on the data. It then summarizes and analyzes the hybrid, segment, and miRNA characteristics of each input file. Analyses for each individual file, and a combined summary analysis are all produced.
The sequencing information is available at NCBI Gene Expression Omnibus (GEO) GSE101978, at:
The data files can be downloaded and uncompressed by using the command:
$ sh ./download_data.sh
The unpacked hyb data-files require ~2 GB of space. The completed output of the analysis requires ~1.5 GB of space.
Type-miRNA Analysis Example Output



Example Target Analysis
This directory contains a example analysis of Hyb-format data, published in the quick Crosslinking and Sequencing of Hybrids (qCLASH) experiment described in: Gay, Lauren A., et al. "Modified cross-linking, ligation, and sequencing of hybrids (qCLASH) identifies Kaposi's Sarcoma-associated herpesvirus microRNA targets in endothelial cells." Journal of virology 92.8 (2018): e02138-17.
- The analysis is carried out in multiple example implementations which produce identical output:
via the Command-Line
via the Python3 API
This analysis specifically filters and analyzes the kshv-miR-K12-5 miRNA arising from Kaposi's Sarcoma-Associated Herpesvirus (KSHV), which has the assigned type "KSHV-miRNA". Both individual and summary output files are produced.
Hybrid sequences generated by the Hyb program are available at NCBI Gene Expression Omnibus (GEO) GSE101978, at:
The data files can be downloaded and uncompressed by using the command:
$ sh ./download_data.sh
The unpacked hyb data-file require ~130 MB of space. The completed output of the analysis requires ~20 MB of space.
Target Analysis Example Output


Example Grouped Target Analysis
This directory contains a example analysis of Hyb-format data, published in the quick Crosslinking and Sequencing of Hybrids (qCLASH) experiment described in: Gay, Lauren A., et al. "Modified cross-linking, ligation, and sequencing of hybrids (qCLASH) identifies Kaposi's Sarcoma-associated herpesvirus microRNA targets in endothelial cells." Journal of virology 92.8 (2018): e02138-17.
- The analysis is carried out in multiple example implementations which produce identical output:
via the Command-Line
via the Python3 API
This analysis specifically investigates and characterizes miRNA arising from six experimental replicates from two conditions with cells infected with Kaposi's Sarcoma Herpesvirus, which are given the type name "KSHV_miRNA". The hybrid reads from KSHV miRNA are grouped and analyzed toghether. Both individual and summary output files are produced.
Hybrid sequence information created by the Hyb program information is available at NCBI Gene Expression Ombnibus (GEO) GSE101978, at:
The data files can be downloaded and uncompressed by using the command:
$ sh ./download_data.sh"
The unpacked hyb data-file require ~1.3 GB of space. The completed output of the analysis requires ~40 MB of space.
Grouped Target Analysis Example Output

Example Fold Analysis
This directory contains a example analysis of Hyb-format and Vienna-format data, published in the quick Crosslinking and Sequencing of Hybrids (qCLASH) experiment described in: Gay, Lauren A., et al. "Modified cross-linking, ligation, and sequencing of hybrids (qCLASH) identifies Kaposi's Sarcoma-associated herpesvirus microRNA targets in endothelial cells." Journal of virology 92.8 (2018): e02138-17.
- The analysis is carried out in multiple example implementations which produce identical output:
via the Command-Line
via the Python3 API
This analysis investigates the predicted folding of miRNA from an experimental replicate infected with Kaposi's Sarcoma-Associated Herpesvirus (KSHV), which are given the type name "KSHV-miRNA". Data from the predicted folding fold for each hybrid record produced by the "Hyb" program are analyzed, and the folds of each KSHV miRNA with a non-miRNA target are characterized to determine the per-base folding folds.
Hybrid sequence information created by the Hyb program and the fold output are provided with the hybkit package in the databases directory, created by downstream analysis of files available at NCBI Gene Expression Omnibus (GEO) GSE101978, at:
The data files can be copied and uncompressed by using the command:
$ sh ./prepare_data.sh
The unpacked data-files require ~150 MB of space. The completed output of the analysis requires ~30 MB of space.
Fold Analysis Example Output



About
Renne Lab
Principal Investigator: Rolf RenneHenry E. Innes Professor of Cancer ResearchUniversity of FloridaUF Health Cancer CenterUF Department of Molecular Genetics and MicrobiologyUF Genetics Institute
Lead Developer
Daniel Stribling <ds@ufl.edu>University of Florida, Renne Lab
Changelog
0.3.4 (2023-11) Changes include:
Misc Bugfixes and Refinements
Switch code linting to Ruff
Add hybkit.errors module and HybkitError classes
Moved printing of warnings to python logging module
Add option for direct passage of file-like object for construction of HybFile and ViennaFile
Add HybRecord.to_csv_header(), HybRecord.to_fields(), and HybRecord.to_fields_header() methods
Refine description of HybFile.open() constructor method
Add typing_extensions dependency
Add Python3.8-compatible type hints to API
Documentation Updates
0.3.3 (2023-09) Changes include:
Misc Bugfixes and Refinements
Update integer bar-plot functions
0.3.2 (2023-08) Changes include:
Misc Bugfixes and Refinements
Add duplicate hybrid filtration (by HybRecord.id) options to hyb_filter
Add duplicate hybrid filtration to example analyses
0.3.1 (2023-08) Changes include:
Misc Bugfixes and Refinements
Add --version flag to scripts
Change move scripts output file description to argparse epilog
Refine plot functions
Change default plot colors to the Bang Wong scheme [Wong2011] for colorblind accessibility
Documentation corrections
Spellcheck
0.3.0 (2023-04) Major Codebase And API Overhaul. Changes include:
Simplifying HybRecord API
Simplifying FoldRecord API
Unifying settings information for argparse and modules
Removing Support for ViennaD format
Moving identifier-parsing code to module type_finder
Moving target region analysis code to module region_finder
Moving code for settings into a "settings" module
Renamed HybRecord type_analysis and mirna_analysis to eval_types and eval_mirna, respectively to differentiate from analysis module functions
Reimplemented analyses methods within a single Analysis class
Added error checking / catching to HybFoldIter
Removed Target-Region Analysis and associated files due to lack of archival database information, pending future development
Added "dynamic" seq_type to FoldRecord for non-identical fold/hybrid sequence handling
Added shell implementation to all example analyses
Remove support for Python3.6, Python3.7
Migrate to CircleCI for CI/CD
Added Pytest unit testing integrated with CircleCI
Other Misc. Improvements / Bugfixes
0.2.0 (2020-03) Added Command-line Toolkit. Code Refinements.
0.1.9 (2020-03) Fix for Module Path Finding for Python > 3.6
0.1.8 (2020-03) Streamlining, PyPI / PIP Initial Release
0.1.0 (2020-01) Initial Implementation
References
- ViennaFormat
- CTFormat
- Zuker2003
Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 2003 Jul 1;31(13):3406-15. doi: 10.1093/nar/gkg595. PMID: 12824337; PMCID: PMC169194.
- Hunter2007
J. Hunter, "Matplotlib: A 2D Graphics Environment" in Computing in Science & Engineering, vol. 9, no. 03, pp. 90-95, 2007. doi: 10.1109/MCSE.2007.55
- Cock2009
Cock PJ, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJ. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009 Jun 1;25(11):1422-3. doi: 10.1093/bioinformatics/btp163. Epub 2009 Mar 20. PMID: 19304878; PMCID: PMC2682512.
- Lorenz2011
Lorenz, R., Bernhart, S.H., Höner zu Siederdissen, C. et al. ViennaRNA Package 2.0. Algorithms Mol Biol 6, 26 (2011). doi: 10.1186/1748-7188-6-26
- Wong2011(1,2)
Wong, B. Points of view: Color blindness. Nat Methods 8, 441 (2011). doi: 10.1038/nmeth.1618
- Helwak2013
Helwak A, Kudla G, Dudnakova T, Tollervey D. Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell. 2013 Apr 25;153(3):654-65. doi: 10.1016/j.cell.2013.03.043. PMID: 23622248; PMCID: PMC3650559.
- Travis2014
Travis AJ, et al. Hyb: a bioinformatics pipeline for the analysis of CLASH (crosslinking, ligation and sequencing of hybrids) data. Methods. 2014 Feb;65(3):263-73. doi: 10.1016/j.ymeth.2013.10.015.
- Gay2018
Gay LA, Sethuraman S, Thomas M, Turner PC, Renne R. Modified Cross-Linking, Ligation, and Sequencing of Hybrids (qCLASH) Identifies Kaposi's Sarcoma-Associated Herpesvirus MicroRNA Targets in Endothelial Cells. J Virol. 2018 Mar 28;92(8):e02138-17. doi: 10.1128/JVI.02138-17. PMID: 29386283; PMCID: PMC5874430.