Interactive visual pattern search in sequential data
using unsupervised deep representation learning

Peax is a novel feature-based technique for interactive visual pattern search in sequential data. Visually searching for patterns by similarity is often challenging because of the large search space, the visual complexity of patterns, and the user's perception of similarity. For example, in genomics, researchers try to link patterns in multivariate sequential data to fundamental cellular or pathogenic processes, but a lack of ground truth and high variance makes automatic pattern detection unreliable. We have developed a convolutional autoencoder for unsupervised representation learning of regions in sequential data that can capture more visual details of complex patterns compared to existing similarity measures. Using this learned representation as features of the sequential data, our visual query system enables interactive feedback-driven adjustments of the pattern search to adapt to the users' perceived similarity. While users label regions as either matching their search target or not, a random forest classifier learns to weigh the importance of different dimensions of the learned representation. We employ an active learning strategy to focus the labeling process on regions that will improve the classifier in subsequent training.

Video Introduction with an Example from Epigenomics

Presentations

EuroVis 2020 Best Paper Presentation, May 2020
BioVis 2020 Presentation, July 2020
NIH ENCODE, July 2020
Slides from BioIT World, May 2019

Paper

  1. Peax: Interactive Visual Pattern Search in Sequential Data Using Unsupervised Deep Representation Learning

    1. Fritz Lekschas
    2. Brant Peterson
    3. Daniel Haehn
    4. Eric Ma
    5. Nils Gehlenborg
    6. Hanspeter Pfister
    Computer Graphics Forum, 2020. doi: 10.1111/cgf.13971
    • EuroVis 2020 Best Paper Award

Code & Data

The application's source code, the study code, and 6 pre-trained autoencoder for 3, 12, and 120 kb windows of DNase-seq and histone mark ChIP-seq data are available at:

Authors

  1. Fritz Lekschas

    Harvard School of Engineering and Applied Sciences

  2. Brant Peterson

    Novartis Institutes for BioMedical Research

  3. Daniel Haehn

    Harvard School of Engineering and Applied Sciences
    University of Massachusetts Boston

  4. Eric Ma

    Novartis Institutes for BioMedical Research

  5. Nils Gehlenborg

    Harvard Medical School

  6. Hanspeter Pfister

    Harvard School of Engineering and Applied Sciences