hyb_filter

Read one or more ‘.hyb’ format files and output files that meet (or exclude) speific criteria.

This script takes one or more filter and/or exclusion criteria and outputs only those records matching (/excluding) those criteria.

The filter criteria and options are based on the options provided by the hybkit.HybRecord.has_prop() method of the Hybkit API.

Example System Calls:
hyb_filter -i my_file_1.hyb --filter has_seg_types
# Outputs records that have completed a segtype analysis

hyb_filter -i my_file_1.hyb --include seg_type mRNA
# Outputs records with either segtype of mRNA

hyb_filter -i my_file_1.hyb --exclude seg_type mRNA
# Outputs records without either segtype of mRNA

hyb_filter -i my_file_1.hyb --include seg1_type mRNA
# Outputs records with only the first / |5p| segtype of mRNA

hyb_filter -i my_file_1.hyb --include seg_type_contains RNA
# Outputs all records with a segtype that includes
#   the string "RNA" (case-sensitive)

hyb_filter -i my_file_1.hyb --filter seg_contains kshv
# Outputs records where either segment identifier contains the
#   the string: "kshv" (case-sensitive)

Multiple filtering options can be used together. The -m / --filter_mode argument determines whether “any” (DEFAULT) or “all” filters are required to be true for inclusion. Note: Matching any exclusion criteria results in exclusion of the record.

Example System Calls (match ALL criteria):
hyb_filter -i my_file_1.hyb --filter seg_contains kshv \
           --filter_2 seg_type miRNA
# Outputs records with either reference sequence identifier containing "kshv"
#   and with a segtype of miRNA
Example System Calls (match ANY criteria):
hyb_filter -i my_file_1.hyb --filter_mode any \
           --filter seg_type miRNA \
           --filter_2 seg_type lncRNA
# Outputs records containing either segment type matching
#   either "miRNA" or "lncRNA" (case-sensitive)
Output File Naming:

Output files can be named in two fashions: via automatic name generation, or by providing specific out file names.

Automatic Name Generation:

For output name generation, the default respective naming scheme is used:

hyb_script -i PATH_TO/MY_FILE_1.HYB [...]
    -->  OUT_DIR/MY_FILE_1_ADDSUFFIX.HYB

This output file path can be modified with the arguments {–out_dir, –out_suffix} described below.

The output directory defaults to the current working directory ($PWD), and can be modified with the --out_dir <dir> argument. Note: The provided directory must exist, or an error will be raised. For Example:

hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_dir MY_OUT_DIR
    -->  MY_OUT_DIR/MY_FILE_1_ADDSUFFIX.HYB

The suffix used for output files is based on the primary actions of the script. It can be specified using --out_suffix <suffix>. This can optionally include the “.hyb” final suffix. for Example:

hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_suffix MY_SUFFIX
    -->  OUT_DIR/MY_FILE_1_MY_SUFFIX.HYB
#OR
hyb_script -i PATH_TO/MY_FILE_1.HYB [...] --out_suffix MY_SUFFIX.HYB
    -->  OUT_DIR/MY_FILE_1_MY_SUFFIX.HYB
Specific Output Names:

Alternatively, specific file names can be provided via the -o/–out_hyb argument, ensuring that the same number of input and output files are provided. This argument takes precedence over all automatic output file naming options (–out_dir, –out_suffix), which are ignored if -o/–out_hyb is provided. For Example:

hyb_script [...] --out_hyb MY_OUT_DIR/OUT_FILE_1.HYB MY_OUT_DIR/OUT_FILE_2.HYB
    -->  MY_OUT_DIR/OUT_FILE_1.hyb
    -->  MY_OUT_DIR/OUT_FILE_2.hyb

Note: The directory provided with output file paths (MY_OUT_DIR above) must exist, otherwise an error will be raised.

usage: hyb_filter [-h] -i PATH_TO/MY_FILE.HYB [PATH_TO/MY_FILE.HYB ...]
                  [-o PATH_TO/OUT_FILE.HYB [PATH_TO/OUT_FILE.HYB ...]]
                  [-d OUT_DIR] [-u OUT_SUFFIX] [-m {any,all}]
                  [--exclusion_table [{True,False}]]
                  [--filter FILTER [FILTER ...]]
                  [--filter_2 FILTER_2 [FILTER_2 ...]]
                  [--filter_3 FILTER_3 [FILTER_3 ...]]
                  [--exclude EXCLUDE [EXCLUDE ...]]
                  [--exclude_2 EXCLUDE_2 [EXCLUDE_2 ...]]
                  [--exclude_3 EXCLUDE_3 [EXCLUDE_3 ...]] [--set_dataset]
                  [-v | -s] [--mirna_types MIRNA_TYPES [MIRNA_TYPES ...]]
                  [--custom_flags CUSTOM_FLAGS [CUSTOM_FLAGS ...]]
                  [--hyb_placeholder HYB_PLACEHOLDER]
                  [--reorder_flags {True,False}]
                  [--allow_undefined_flags [{True,False}]]
                  [--allow_unknown_seg_types [{True,False}]]
                  [--check_complete_seg_types [{True,False}]]
                  [--hybformat_id [{True,False}]]
                  [--hybformat_ref [{True,False}]]

Named Arguments

-i, --in_hyb

REQUIRED path to one or more hyb-format files with a “.hyb” suffix for use in the evaluation.

-o, --out_hyb

Optional path to one or more hyb-format file for output (should include a “.hyb” suffix). If not provided, the output for input file “PATH_TO/MY_FILE.HYB” will be used as a template for the output “OUT_DIR/MY_FILE_OUT.HYB”.

-d, --out_dir

Path to directory for output of evaluation files. Defaults to the current working directory.

Default: $PWD

-u, --out_suffix

Suffix to add to the name of output files, before any file- or analysis-specific suffixes. The file-type appropriate suffix will be added automatically.

Default: “_filtered”

-m, --filter_mode

Possible choices: any, all

Modes for evaluating multiple filters. The “all” mode requires all provided filters to be true for inclusion. The “any” mode requires only one provided filter to be true for inclusion. (Note: matching any exclusion filter is grounds for exclusion of record.)

Default: “all”

--exclusion_table

Possible choices: True, False

Output an “exclusion table” for use with hyb_filter_fold.

Default: False

--filter

Filter criteria #1. Records matching the criteria will be included in output. Includes a filter type, Ex: “seg_name_contains”, and an argument, Ex: “ENST00000340384”. (Note: not all filter types require a second argument, for Example: “has_mirna_seg”)

--filter_2

Filter criteria #2. Records matching the criteria will be included in output. Includes a filter type, Ex: “seg_name_contains”, and an argument, Ex: “ENST00000340384”. (Note: not all filter types require a second argument, for Example: “has_mirna_seg”)

--filter_3

Filter criteria #3. Records matching the criteria will be included in output. Includes a filter type, Ex: “seg_name_contains”, and an argument, Ex: “ENST00000340384”. (Note: not all filter types require a second argument, for Example: “has_mirna_seg”)

--exclude

Exclusion filter criteria #1. Records matching the criteria will be excluded from output. Includes a filter type, Ex: “seg_name_contains”, and an argument, Ex: “ENST00000340384”. (Note: not all filter types require a second argument, for Example: “has_mirna_seg”)

--exclude_2

Exclusion filter criteria #2. Records matching the criteria will be excluded from output. Includes a filter type, Ex: “seg_name_contains”, and an argument, Ex: “ENST00000340384”. (Note: not all filter types require a second argument, for Example: “has_mirna_seg”)

--exclude_3

Exclusion filter criteria #3. Records matching the criteria will be excluded from output. Includes a filter type, Ex: “seg_name_contains”, and an argument, Ex: “ENST00000340384”. (Note: not all filter types require a second argument, for Example: “has_mirna_seg”)

--set_dataset

Set “dataset” flag to value of the input file name.

Default: False

-v, --verbose

Print verbose output during run.

Default: False

-s, --silent

Print no output during run.

Default: False

Hyb Record Settings

--mirna_types

“seg_type” fields identifying a miRNA

Default: [‘miRNA’, ‘microRNA’]

--custom_flags

Custom flags to allow in addition to those specified in the hybkit specification.

Default: []

--hyb_placeholder

placeholder character/string for missing data in hyb files.

Default: “.”

--reorder_flags

Possible choices: True, False

Re-order flags to the hybkit-specificiation order when writing hyb records.

Default: True

--allow_undefined_flags

Possible choices: True, False

Allow use of flags not definied in the hybkit-specificiation order when reading and writing hyb records. As the preferred alternative to using this setting, the –custom_flags arguement can be be used to supply custom allowed flags.

Default: False

--allow_unknown_seg_types

Possible choices: True, False

Allow unknown segment types when assigning segment types.

Default: False

--check_complete_seg_types

Possible choices: True, False

Check every segment possibility when assigning segment types, rather than breaking after the first match is found. If True, finding segment types is slower but better at catching errors.

Default: False

Hyb File Settings

--hybformat_id

Possible choices: True, False

The Hyb Software Package places further information in the “id” field of the hybrid record that can be used to infer the number of contained read counts. When set to True, the identifiers will be parsed as: “<read_id>_<read_count>”

Default: False

--hybformat_ref

Possible choices: True, False

The Hyb Software Package uses a reference database with identifiers that contain sequence type and other sequence information. When set to True, all hyb file identifiers will be parsed as: “<gene_id>_<transcript_id>_<gene_name>_<seg_type>”

Default: False