Dissertation Defense: Infrastructure For Scalable Analysis Of Genomic Variation

Adam Novak

Adam Novak, PhD Candidate, Biomolecular Engineering & Bioinformatics
Friday, June 9, 2017 – 11:00am
Location – Biomedical Sciences, Room 200
Host – Professor David Haussler

Title: Infrastructure For Scalable Analysis Of Genomic Variation

Abstract: The scale of the problems which human genomics is asked to solve necessitates that the field develop an ability to integrate and synthesize information across the entire human population. The abstraction of a single-copy human reference genome assembly, and the linear coordinate space that it induces, are more of a hindrance than a help at these scales. They can only ever represent one sample at any given place, and they make combining information about human variation across multiple studies and modalities difficult. To rectify these problems, I propose the construction and adoption of a graph-based alternative to the human reference genome assembly: a Human Genome Variation Map. I present here four research projects. The first is a theory of mapping to references that is extensible to graphs. The second describes a novel data structure for embedding individual haplotype sequences into a graph reference. The third surveys graph construction techniques to discover methods that produce graphs yielding read mapping and variant calling results superior to those obtained with linear, variation-free references. The fourth extends these improvement results to chromosome-scale graphs constructed from multiple sources and modalities of variation data. These four projects describe a research program aimed towards the eventual release of an official Human Genome Variation Map build, providing a piece of vital infrastructure for the analysis of human genomic variation at population scale.