hyb_analyze
Read hyb / vienna files and analyze the fold information in the contained hybrid sequences.
Analysis Types:
This utility reads in one or more files in hyb-format (see the hybkit Hyb File Specification) along with a corresponding predicted RNA secondary structure file in the "Vienna" (Vienna Format) or "CT" (CT_Format) formats, and analyzes hybrid secondary structure properties.
For full information on the different analysis types, see the Analyses section of the hybkit documentation.
- Example system calls:
$ hyb_analyze -a fold -i my_file_1.hyb -f my_file_1.vienna $ hyb_analyze -a fold -i my_file_2.hyb -f my_file_2.ct \\ --make_plots False
usage: hyb_analysis [-h] -i PATH_TO/MY_FILE.HYB [PATH_TO/MY_FILE.HYB ...]
[-f [PATH_TO/MY_FILE.VIENNA [PATH_TO/MY_FILE.VIENNA ...]]]
[-o PATH_TO/OUT_BASENAME [PATH_TO/OUT_BASENAME ...]]
[-d OUT_DIR] [-u OUT_SUFFIX]
[-a {energy,type,mirna,target,fold} [{energy,type,mirna,target,fold} ...]]
[-n ANALYSIS_NAME] [-p {True,False}] [--version] [-v | -s]
[--mirna_types MIRNA_TYPES [MIRNA_TYPES ...]]
[--custom_flags CUSTOM_FLAGS [CUSTOM_FLAGS ...]]
[--hyb_placeholder HYB_PLACEHOLDER]
[--reorder_flags {True,False}]
[--allow_undefined_flags [{True,False}]]
[--allow_unknown_seg_types [{True,False}]]
[--hybformat_id [{True,False}]]
[--hybformat_ref [{True,False}]]
[--allowed_mismatches ALLOWED_MISMATCHES]
[--fold_placeholder FOLD_PLACEHOLDER]
[-y {static,dynamic}]
[--error_mode {raise,warn_return,return}]
[--error_checks {hybrecord_indel,foldrecord_nofold,max_mismatch,energy_mismatch}]
[--iter_error_mode {raise,warn_return,warn_skip,skip,return}]
[--max_sequential_skips MAX_SEQUENTIAL_SKIPS]
[--quant_mode {single,reads,records}]
[--out_delim OUT_DELIM]
Named Arguments
- -i, --in_hyb
REQUIRED path to one or more hyb-format files with a ".hyb" suffix for use in the evaluation.
- -f, --in_fold
REQUIRED path to one or more RNA secondary-structure files with a ".vienna" or ".ct" suffix for use in the evaluation.
- -o, --out_basename
Optional path to one or more basename prefixes to use for output. The appropriate suffix will be added based on the specific name. If not provided, the output for input file "PATH_TO/MY_FILE.HYB" will be used as a template for the basename "OUT_DIR/MY_FILE".
- -d, --out_dir
Path to directory for output of files. Defaults to the current working directory.
Default: $PWD
- -u, --out_suffix
Suffix to add to the name of output files, before any file- or analysis-specific suffixes. The file-type appropriate suffix will be added automatically.
- -a, --analysis_types
Possible choices: energy, type, mirna, target, fold
Analysis to perform on input hyb and fold files.
Default: "fold"
- -n, --analysis_name
Name / title of analysis data.
- -p, --make_plots
Possible choices: True, False
Create plots of analysis output.
Default: True
- --version
Print version and exit.
- -v, --verbose
Print verbose output during run.
Default: False
- -s, --silent
Print no output during run.
Default: False
Hyb Record Settings
- --mirna_types
"seg_type" fields identifying a miRNA
Default: ['miRNA', 'microRNA']
- --custom_flags
Custom flags to allow in addition to those specified in the hybkit specification.
Default: []
- --hyb_placeholder
placeholder character/string for missing data in hyb files.
Default: "."
- --reorder_flags
Possible choices: True, False
Re-order flags to the hybkit-specification order when writing hyb records.
Default: True
- --allow_undefined_flags
Possible choices: True, False
Allow use of flags not defined in the hybkit-specification order when reading and writing hyb records. As the preferred alternative to using this setting, the --custom_flags argument can be be used to supply custom allowed flags.
Default: False
- --allow_unknown_seg_types
Possible choices: True, False
Allow unknown segment types when assigning segment types.
Default: False
Hyb File Settings
- --hybformat_id
Possible choices: True, False
The Hyb Software Package places further information in the "id" field of the hybrid record that can be used to infer the number of contained read counts. When set to True, the identifiers will be parsed as: "<read_id>_<read_count>"
Default: False
- --hybformat_ref
Possible choices: True, False
The Hyb Software Package uses a reference database with identifiers that contain sequence type and other sequence information. When set to True, all hyb file identifiers will be parsed as: "<gene_id>_<transcript_id>_<gene_name>_<seg_type>"
Default: False
Fold Record Settings
- --allowed_mismatches
For DynamicFoldRecords, allowed number of mismatches with a HybRecord.
Default: 0
- --fold_placeholder
Placeholder character/string for missing data for reading/writing fold records.
Default: "."
- -y, --seq_type
Possible choices: static, dynamic
Type of fold record object to use. Options: "static": FoldRecord, requires an exact sequence match to be paired with a HybRecord; "dynamic": DynamicFoldRecord, requires a sequence match to the "dynamic" annotated regions of a HybRecord, and may be shorter/longer than the original sequence.
Default: "static"
- --error_mode
Possible choices: raise, warn_return, return
Mode for handling errors during reading of HybFiles (overridden by HybFoldIter.settings['iter_error_mode'] when using HybFoldIter). Options: "raise": Raise an error when encountered and exit program ; "warn_return": Print a warning and return the error_value ; "return": Return the error value with no program output. record is encountered.
Default: "raise"
Hyb-Fold Iterator Settings
- --error_checks
Possible choices: hybrecord_indel, foldrecord_nofold, max_mismatch, energy_mismatch
Error checks for simultaneous HybFile and FoldFile parsing. Options: "hybrecord_indel": Error for HybRecord objects where one/both sequences have insertions/deletions in alignment, which prevents matching of sequences; "foldrecord_nofold": Error when failure in reading a fold_record object; "max_mismatch": Error when mismatch between hybrecord and foldrecord sequences is greater than FoldRecord "allowed_mismatches" setting; "energy_mismatch": Error when a mismatch exists between HybRecord and FoldRecord energy values.
Default: ['hybrecord_indel', 'foldrecord_nofold', 'max_mismatch', 'energy_mismatch']
- --iter_error_mode
Possible choices: raise, warn_return, warn_skip, skip, return
Mode for handling errors found during error checks. Overrides HybRecord "error_mode" setting when using HybFoldIter. Options: "raise": Raise an error when encountered; "warn_return": Print a warning and return the value; "warn_skip": Print a warning and continue to the next iteration; "skip": Continue to the next iteration without any output; "return": return the value without any error output;
Default: "warn_skip"
- --max_sequential_skips
Maximum number of record(-pairs) to skip in a row. Limited as several sequential skips usually indicates an issue with record formatting or a desynchronization between files.
Default: 100
Analysis Settings
- --quant_mode
Possible choices: single, reads, records
Method for counting records. Options: "single": Count each record as a single entry; "reads": Use the number of reads per hyb record as the count (may contain PCR duplicates); "records": Count the number of records represented by each hyb record entry (1 for "unmerged" records, >= 1 for "merged" records)
Default: "single"
- --out_delim
Delimiter-string to place between fields in analysis output.
Default: ","
- Output File Naming:
Output files can be named in two fashions: via automatic name generation, or by providing specific out file names.
- Automatic Name Generation:
For output name generation, the default respective naming scheme is used:
hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --> OUT_DIR/MY_FILE_1_ADDSUFFIX.HYB
This output file path can be modified with the arguments {--out_dir, --out_suffix} described below.
The output directory defaults to the current working directory
($PWD)
, and can be modified with the--out_dir <dir>
argument. Note: The provided directory must exist, or an error will be raised. For Example:hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_dir MY_OUT_DIR --> MY_OUT_DIR/MY_FILE_1_ADDSUFFIX.HYB
The suffix used for output files is based on the primary actions of the script. It can be specified using
--out_suffix <suffix>
. This can optionally include the ".hyb" final suffix. for Example:hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_suffix MY_SUFFIX --> OUT_DIR/MY_FILE_1_MY_SUFFIX.HYB #OR hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_suffix MY_SUFFIX.HYB --> OUT_DIR/MY_FILE_1_MY_SUFFIX.HYB
- Specific Output Names:
Alternatively, specific file names can be provided via the -o/--out_hyb argument, ensuring that the same number of input and output files are provided. This argument takes precedence over all automatic output file naming options (--out_dir, --out_suffix), which are ignored if -o/--out_hyb is provided. For Example:
hyb_script [...] --out_hyb MY_OUT_DIR/OUT_FILE_1.HYB MY_OUT_DIR/OUT_FILE_2.HYB --> MY_OUT_DIR/OUT_FILE_1.hyb --> MY_OUT_DIR/OUT_FILE_2.hyb
Note: The directory provided with output file paths (MY_OUT_DIR above) must exist, otherwise an error will be raised.