- This project contains multiple components:
- (ToDo) The hybkit toolkit of command-line utilities for manipulating, analyzing, and plotting data contained within hyb-format files.
- Analysis pipelines utilizing the toolkit for analysis of qCLASH hybrid sequence data.
- The hybkit python API, an extendable documented codebase for creation of custom analyses of hyb-format data.
- Hybkit Toolkit:
hybkit includes (will include) command-line utilities for the manipulation of “.hyb” format data:
Utility Description hyb_check.py Read a “.hyb” file and check for errors hyb_filter.py Filter a “.hyb” file to a specific subset of sequences hyb_analyze.py Analyze and set details for hyb records, such as segtypes hyb_type_analysis.py Perform a type analysis on a prepared “hyb” file hyb_mirna_count_anlaysis.py Perform a miRNA_count analysis on a prepared “hyb” file hyb_summary_anlaysis.py Perform a summary analysis on a prepared “hyb” file hyb_mirna_target_analysis.py Perform a mirna_target analysis on a prepared “hyb” file hyb_fold_analysis.py Perform a fold analysis on a prepared “hyb” file
These scripts are used on the command line with hyb-format files. For example, to filter a hyb file to contain only sequences with identifiers containing the string “KSHV”:
$ hyb_filter.py ....[command_example]
Hybkit provides several example pipelines for analysis of “hyb” data using the utilities provided in the toolkit. These include:
pipeline description Summary Analysis Summarize the sequence and miRNA types in a hyb file Target Analysis Analyze targets of a set of miRNA Grouped Target Analysis Analyze targets of a set of miRNA with grouped replicates Fold Analysis Analyze fold patterns of miRNA-containing hybrids Fold Target Region Analysis Perform fold analysis separated by targeted mRNA region
These pipelines provide analysis results in both tabular and graph form. As an illustration, the example summary analysis includes the return of the contained hybrid sequence types as both a csv table and as a pie chart:
- Hybkit API:
Hybkit provides a Python3 module with a documented API for interacting with records in “.hyb” files. This capability was inspired by the object interactions in the BioPython Project. The primary utility is provided by objects used to represent hyb records within hyb files. These records are assigned accessible attributes, and can be analyzed using builtin functions. For example, a workflow to print the identifiers of only sequences within a “.hyb” file that contain a miRNA can be performed as such:
import hybkit in_file = '/path/to/my_hyb_file.hyb' # Open a hyb file as a HybFile Object: with hybkit.HybFile.open(in_file, 'r') as hyb_file: # Return each line in a hyb file as a HybRecord object for hyb_record in hyb_file: # Analyze each record to assign segment types hyb_record.find_types() # If the record contains an miRNA type, print the record identifier. if hyb_record.has_property('segtype_contains', 'miRNA') print(hyb_record.id)
Hybkit is still in beta testing. Feedback and comments are welcome to firstname.lastname@example.org !
Hybkit requires Python 3.6+ and the use of the matplotlib package.
The recommended installation method is via python pip, which will automatically handle version control and dependency installation:
$ pip install hybkit
Acquisition of the package can also be performed by cloning the project’s Github repository:
$ git clone git:://github.com/RenneLab/hybkit
Or by downloading the zipped package:
$ curl -OL https://github.com/dstrib/hybkit/archive/master.zip $ unzip master.zip
Followed by installation using python’s setuptools:
$ python setup.py install
- hybkit Specification
- hybkit Toolkit
- Example Pipelines
- Example Summary Analysis
- Example Target Analysis
- Example Grouped Target Analysis
- Example Fold Analysis
- Example Fold Target Region Analysis
- hybkit API