ReVeaL

ReVeaL, Rare Variant Learning, is a stochastic regularization-based learning algorithm.

Method

ReVeaL partitions the genome into non-overlapping, possibly non-contiguous, windows (w) and then aggregates samples into possibly overlapping subsets, using subsampling with replacement (stochastic), giving units called shingles that are utilized by a statistical learning algorithm. Each shingle captures a distribution of the mutational load (the number of mutations in the window w of a given sample), and the first four moments are used as an approximation of the distribution.

_images/reveal_flowchart.jpeg

The entire ReVeaL pipeline can be executed by:

compute(sample_info, label_info, ...[, ...])

This function compute the mutational load and shingles, storing them in a folder as csv file

Alternative, one can using the following API:

Pre-processing

Collection of common pre-processing functionalities.

compute_mutational_load(sample_data, ...[, ...])

Function to compute the mutational load files

train_test_split(label_info, ...[, ...])

Split data into training and test set.

permute_labels(label_info[, seed])

Permute the sample labels.

Shingle Computation

compute_shingle(train_samples, test_samples, ...)

This function compute the shingles, generating the train and test sample

An example of usage can be found in this tutorial

Citation

Please cite the following article if you use ReVeaL:

Parida L, Haferlach C, Rhrissorrakrai K, Utro F, Levovitz C, Kern W, et al. (2019) Dark-matter matters: Discriminating subtle blood cancers using the darkest DNA. PLoS Comput Biol 15(8): e1007332. https://doi.org/10.1371/journal.pcbi.1007332