hybkit.type_finder

hybkit TypeFinder Class.

This module contains the TypeFinder class to work with HybRecord to parse sequence identifiers to identify sequence type.

class hybkit.type_finder.TypeFinder

Class for parsing identifiers to identify sequence 'type'.

Designed to be used by the hybkit.HybRecord

Variables

params (dict) -- Stored parameters for string parsing, where applicable.

find_with_params = None

Placeholder for storing active method, set with set_method() (see set_method() for details).

params = None

Placeholder for parameters for active method, set with set_method() (see set_method() for details).

default_method = 'hybformat'

Default method assigned using check_set_method()

methods = {'hybformat': 'method_hybformat', 'id_map': 'method_id_map', 'string_match': 'method_string_match'}

Dict of provided methods available to assign segment types

'hybformat'

method_hybformat()

'string_match'

method_string_match()

'id_map'

method_id_map()

param_methods = {'hybformat': None, 'id_map': 'make_id_map_params', 'string_match': 'make_string_match_params'}

Dict of param generation methods for type finding methods

'hybformat'

'N/A'

'string_match'

make_string_match_params()

'id_map'

make_id_map_params()

param_methods_needs_file = {'hybformat': False, 'id_map': True, 'string_match': True}

Dict of whether parameter generation methods need an input file

'hybformat'

False

'string_match'

True

'id_map'

True

classmethod set_method(method: str, params: Optional[Dict[str, Any]] = None) None

Select method to use when finding types.

Available methods are listed in methods.

Parameters
  • method (str) -- Method option from methods to set for use as find().

  • params (dict, optional) -- Dict object of parameters to use by set method.

classmethod method_is_set() bool

Return whether a TypeFinder method has been set.

Methods should be set with set_method().

Returns

True if a method has been set, False otherwise.

Return type

bool

classmethod check_set_method() None

If no TypeFinder method set, set as default_method.

classmethod find(seg_props: Dict[str, Union[float, int, str]]) Optional[str]

Find type of segment using TypeFinder.find_custom_method().

If a TypeFinder method has been set with set_method(). use the current parameters set in params to find the type of the provided segment by calling:

seg_type = :meth:`TypeFinder.find_custom_method`(seg_props, :attr`TypeFinder.params`)
Parameters

seg_props (dict) -- seg_props from hybkit.HybRecord

Returns

Type of the provided segment, or None if a type cannot be identified.

Return type

str

classmethod set_custom_method(method: Callable, params: Optional[dict] = None) None

Set the method for use to find seg types.

This method is for providing a custom function. To use the included functions, use set_method(). Custom functions provided must have the signature:

seg_type = custom_method(self, seg_props, params)

This function should return the string of the assigned segment type if found, or a None object if the type cannot be found. It can also take a dictionary in the "params" argument that specifies additional or dynamic search properties, as desired.

Parameters
  • method (method) -- Method to set for use.

  • params (dict, optional) -- dict of custom parameters to set for use.

static method_hybformat(seg_props: Dict[str, Union[float, int, str]], params: Optional[dict] = None) Optional[str]

Return the type of the provided segment, or None if segment cannot be identified.

This method works with sequence / alignment mapping identifiers in the format of the reference database provided by the Hyb Software Package, specifically identifiers of the format:

<gene_id>_<transcript_id>_<gene_name>_<seg_type>

This method returns the last component of the identifier, split by "_", as the identified sequence type. (returns None if the segment identifier does not contain "_").

Example

"MIMAT0000076_MirBase_miR-21_microRNA"  --->  "microRNA".
Parameters
static method_string_match(seg_props: Dict[str, Union[float, int, str]], params: Optional[dict] = None) Optional[str]

Return the type of the provided segment, or None if unidentified.

This method attempts to find a string matching a specific pattern within the identifier of the aligned segment. Search options include "startswith", "contains", "endswith", and "matches", and returns the first type matching the criteria. The required params dict should contain a key for each desired search type, with a list of 2-tuples for each search-string with assigned-type.

Example

params = {'endswith': [('_miR', 'microRNA'),
                       ('_trans', 'mRNA')   ]}

This dict can be generated with the associated make_string_match_params() method and an associated csv legend file with format:

#comment line
#search_type,search_string,seg_type
endswith,_miR,microRNA
endswith,_trans,mRNA
Parameters
  • seg_props (dict) -- HybRecord segment properties dict to evaluate.

  • params (dict, optional) -- Dict with search parameters as described above.

static make_string_match_params(legend_file: str) dict

Read csv and return a dict of search parameters for method_string_match().

The my_legend.csv file should have the format:

#comment line
#search_type,search_string,seg_type
endswith,_miR,microRNA
endswith,_trans,mRNA

Search_type options include "startswith", "contains", "endswith", and "matches" The produced dict object contains a key for each search type, with a list of 2-tuples for each search-string and associated segment-type.

For example:

{'endswith': [('_miR', 'microRNA'),
              ('_trans', 'mRNA')   ]}
static method_id_map(seg_props: Dict[str, Union[float, int, str]], params: Optional[dict] = None) Optional[str]

Return the type of the provided segment or None if it cannot be identified.

This method checks to see if the identifier of the segment is present in the params dict. params should be formatted as a dict with keys as sequence identifier names, and the corresponding type as the respective values.

Example

params = {'MIMAT0000076_MirBase_miR-21_microRNA': 'microRNA',
          'ENSG00000XXXXXX_NR003287-2_RN28S1_rRNA': 'rRNA'}

This dict can be generated with the associated make_id_map_params() method.

Parameters
  • seg_props (dict) -- HybRecord segment properties dict to evaluate.

  • params (dict) -- Dict of mapping of sequence identifiers to sequence types.

Returns

Identified sequence type, or None if it cannot be found.

Return type

str

static make_id_map_params(mapped_id_files: List[str]) dict

Read file(s) into a mapping of sequence identifiers.

This method reads one or more files into a dict for use with the method_id_map() method. The method requires passing a file path (or list/tuple of file paths) of mapped_id_files. Files listed in the mapped_id_files argument should have the format:

#comment line
#seg_id,seg_type
segA_unique_id,segA_type
segB_unique_id,segB_type
Parameters

mapped_id_files (str, list, or tuple) -- Iterable object containing strings of paths to files containing id/type mapping information.