hybkit.type_finder
hybkit TypeFinder Class.
This module contains the TypeFinder class to work with HybRecord
to
parse sequence identifiers to identify sequence type.
- class hybkit.type_finder.TypeFinder
Class for parsing identifiers to identify sequence 'type'.
Designed to be used by the
hybkit.HybRecord
- Variables
params (dict) -- Stored parameters for string parsing, where applicable.
- find_with_params = None
Placeholder for storing active method, set with
set_method()
(seeset_method()
for details).
- params = None
Placeholder for parameters for active method, set with
set_method()
(seeset_method()
for details).
- default_method = 'hybformat'
Default method assigned using
check_set_method()
- methods = {'hybformat': 'method_hybformat', 'id_map': 'method_id_map', 'string_match': 'method_string_match'}
Dict of provided methods available to assign segment types
'hybformat'
'string_match'
'id_map'
- param_methods = {'hybformat': None, 'id_map': 'make_id_map_params', 'string_match': 'make_string_match_params'}
Dict of param generation methods for type finding methods
'hybformat'
'N/A'
'string_match'
'id_map'
- param_methods_needs_file = {'hybformat': False, 'id_map': True, 'string_match': True}
Dict of whether parameter generation methods need an input file
'hybformat'
'string_match'
'id_map'
- classmethod set_method(method: str, params: Optional[Dict[str, Any]] = None) None
Select method to use when finding types.
Available methods are listed in
methods
.
- classmethod method_is_set() bool
Return whether a TypeFinder method has been set.
Methods should be set with
set_method()
.- Returns
True if a method has been set, False otherwise.
- Return type
- classmethod check_set_method() None
If no TypeFinder method set, set as
default_method
.
- classmethod find(seg_props: Dict[str, Union[float, int, str]]) Optional[str]
Find type of segment using
TypeFinder.find_custom_method()
.If a TypeFinder method has been set with
set_method()
. use the current parameters set inparams
to find the type of the provided segment by calling:seg_type = :meth:`TypeFinder.find_custom_method`(seg_props, :attr`TypeFinder.params`)
- Parameters
seg_props (dict) --
seg_props
fromhybkit.HybRecord
- Returns
Type of the provided segment, or None if a type cannot be identified.
- Return type
- classmethod set_custom_method(method: Callable, params: Optional[dict] = None) None
Set the method for use to find seg types.
This method is for providing a custom function. To use the included functions, use
set_method()
. Custom functions provided must have the signature:seg_type = custom_method(self, seg_props, params)
This function should return the string of the assigned segment type if found, or a None object if the type cannot be found. It can also take a dictionary in the "params" argument that specifies additional or dynamic search properties, as desired.
- Parameters
method (method) -- Method to set for use.
params (dict, optional) -- dict of custom parameters to set for use.
- static method_hybformat(seg_props: Dict[str, Union[float, int, str]], params: Optional[dict] = None) Optional[str]
Return the type of the provided segment, or None if segment cannot be identified.
This method works with sequence / alignment mapping identifiers in the format of the reference database provided by the Hyb Software Package, specifically identifiers of the format:
<gene_id>_<transcript_id>_<gene_name>_<seg_type>
This method returns the last component of the identifier, split by "_", as the identified sequence type. (returns
None
if the segment identifier does not contain "_").Example
"MIMAT0000076_MirBase_miR-21_microRNA" ---> "microRNA".
- Parameters
seg_props (dict) --
seg_props
fromhybkit.HybRecord
params (dict, optional) -- Unused in this method.
- static method_string_match(seg_props: Dict[str, Union[float, int, str]], params: Optional[dict] = None) Optional[str]
Return the type of the provided segment, or None if unidentified.
This method attempts to find a string matching a specific pattern within the identifier of the aligned segment. Search options include "startswith", "contains", "endswith", and "matches", and returns the first type matching the criteria. The required params dict should contain a key for each desired search type, with a list of 2-tuples for each search-string with assigned-type.
Example
params = {'endswith': [('_miR', 'microRNA'), ('_trans', 'mRNA') ]}
This dict can be generated with the associated
make_string_match_params()
method and an associated csv legend file with format:#comment line #search_type,search_string,seg_type endswith,_miR,microRNA endswith,_trans,mRNA
- static make_string_match_params(legend_file: str) dict
Read csv and return a dict of search parameters for
method_string_match()
.The my_legend.csv file should have the format:
#comment line #search_type,search_string,seg_type endswith,_miR,microRNA endswith,_trans,mRNA
Search_type options include "startswith", "contains", "endswith", and "matches" The produced dict object contains a key for each search type, with a list of 2-tuples for each search-string and associated segment-type.
For example:
{'endswith': [('_miR', 'microRNA'), ('_trans', 'mRNA') ]}
- static method_id_map(seg_props: Dict[str, Union[float, int, str]], params: Optional[dict] = None) Optional[str]
Return the type of the provided segment or None if it cannot be identified.
This method checks to see if the identifier of the segment is present in the params dict. params should be formatted as a dict with keys as sequence identifier names, and the corresponding type as the respective values.
Example
params = {'MIMAT0000076_MirBase_miR-21_microRNA': 'microRNA', 'ENSG00000XXXXXX_NR003287-2_RN28S1_rRNA': 'rRNA'}
This dict can be generated with the associated
make_id_map_params()
method.
- static make_id_map_params(mapped_id_files: List[str]) dict
Read file(s) into a mapping of sequence identifiers.
This method reads one or more files into a dict for use with the
method_id_map()
method. The method requires passing a file path (or list/tuple of file paths) of mapped_id_files. Files listed in the mapped_id_files argument should have the format:#comment line #seg_id,seg_type segA_unique_id,segA_type segB_unique_id,segB_type