What Does Your Genome Look Like?
What Does Your Genome Look Like?
I firmly believe that every biologist should have a favorite scene in the movie Jurassic Park. As the apex of 90’s technoscience ambition in film, it is the darling, flaws and all, of many scientists who were young when it came out. My favorite snippet shows what might be called “virtual reality gene editing” in the research labs where transgenic dinosaurs are created. The camera pans briefly by a white-coated scientist wearing a virtual reality visor and what looks like a pair of Nintendo Powergloves. The screen in front of him shows a DNA strand, which moves with his gloves. Presumably his motions are causing the movement of a real DNA strand, although how wiggling around a short bit of DNA is useful for making dinosaurs must be left to our own imaginations.
The scene blends virtual reality with genetic manipulation in a wet dream of 90’s tech ambition. This pair of DNA-splicing Powergloves illustrates a potent, if fictional, union of genetic manipulation with visual technology, a union that required some prescient crafting of how we might see and interact with genes and genomes.
In the real, dinosaur-less world, scientists and the public are decoding, graphing, and representing genomes, human and otherwise. Genomes have become objects of scientific exploration and unprecedented manipulation at a time when we all enjoy — or suffer through — constant representations of large volumes of data. Genomics research and data visualization have expanded in scope and deepened in sophistication in parallel over the past 20-odd years. This essay is an attempt to understand their unity.
Like other digital arrays lumped under the bland banner of “big data,” genomic data is being constantly translated into graphics. Genomes present the familiar yet wily challenge of neatly displaying thousands to billions of data points in order to foreground trends, differences, correlations, or probabilities. Yet many scientists are comfortable thinking deeply about things that we can’t directly observe (think of the interaction between a DNA-binding protein and a stretch of DNA, or spacetime ripples driven by gravitational waves). Many, perhaps most, scientific advances require that we make firm conclusions about things that are not visible. Scientists are understandably accustomed to working with visually-poor sets of data. But the human genome sequence was born in the public eye, and has quite rightly remained there. Genomics research thus forces scientists and the public into a tricky dance in which researchers wield huge datasets to make new knowledge claims, but must then present these claims in a coherent fashion. Visual representations of the genome are the common language shared by scientists and the public on our joint exploration of the human genome, which is to say, ourselves.
But! This language is not set, as any expert in data visualization will attest. The way the genome gets encoded and represented is a ground that is shifting under our scientific feet, an ongoing challenge to think through genomics visually. Whether or not they realize it (or care), genomics researchers are in the middle of the strange and often opaque process of creating a new visual object. Scientists must seize this chance to sculpt how we all see and understand this new object.
In the 1970s, the study of genetics shifted decisively from heritability of wrinkled peas or fruit fly eye color into molecular biology. Genetics was not something that begged to be displayed. Figures in academic articles on genetics required significant background knowledge and interpretation. In the public’s visual lexicon, the DNA double helix itself, while known, was not an image in common use with broad recognition. No member of the public carried small colorful screens with them all day they way we do now. The use of infographics and animations were more limited, and of course interactive graphics as we know them today didn’t exist.
Today we turn raw data of all kinds into interpretable visualizations without a second thought. Both inside and outside of academia, we live in a lush ecosystem of data visualizations. If genome research is important, the public expects to be presented with visual representations of it (try a google image search for “genome”). Yet it is not something we can just snap a picture of and “see”. Genomic data are displayed as a sequence of letters, regions of difference, degrees of correlation , amount of binding by this or that protein, scrunched up chromosomes, or 3D spatial maps.
But the human genome is more than a laundry list of biological parameters. It is also the touchstone of individuality, the kernel of our self, the master of our fates, the footprint of our ancestors, the shared feature of our species. The status granted to the genome makes the stakes incredibly high for how it gets represented and communicated.
Not surprisingly, the organizations who have thought the most about this are companies like 23andMe, who make their living communicating genomic data to their customers. That is to say, the product 23andMe is selling is your genome (ok well it’s a SNP array of your genome) wrapped in some very compelling visualizations. When you get results from 23andMe, the glossy box reads “Welcome to You.” The company places exquisite attention to design and data visualization, elaborating a completely visual language to communicate genomics. Anne Wojcicki, the CEO, had a telling exchange when she was interviewed by Neil deGrasse Tyson. At the start of the interview Wojcicki says to him, “You’ve probably never seen your genome.” To which Tyson replies, “No I never have.” Such a statement would have been meaningless just a few years ago, and now it seems perfectly natural.
Understanding what genomic data MEANS is wrapped up intimately with how we see. As Stephen Few once wrote, “information visualization helps think.” It is high time that scientists, like ancestry companies using overblown SNP analyses, integrate visual thinking into research. Where do genes live? How can we imagine them within coding regions on specific chromosomes, with this or that set of DNA-binding proteins? How different am I from my parents or from a chimpanzee or from a banana? Where do those differences lie and how do they exert their effects? When are these genes expressed across time? Where are they expressed across space in my body? Questions like these pose a familiar difficulty: assigning meanings within one field of biology or at one scale to ramifications in another. Visualization is a conceptual tool that helps make these leaps. It can guide our thinking of biology across sub-fields, across disparate measurement devices, across scales of time or space. One shining success of a conceptual leap made by visualization was published by Congrad Hal Waddington in 1957. Before the molecular details existed to support it, Waddington’s model of epigenetics captured and communicated his conceptual advance.
As dumb as it sounds, the question genomics researchers might begin to ask themselves is “how does it look?” This is not to suggest that they paint pretty pictures of every finding for the sake of making pretty pictures everywhere. Rather, it might be a helpful signpost for communicating (and thus understanding) genomics. The simple questions — How does it look? How could it look?— might rather push scientists to align the power and flexibility of data visualization with the hard, error-prone work of making biological claims relevant across fields. And by now, scientists can no longer give themselves a pass and leave the question of visual representation to the data viz professionals. It must be a lively, aspirational part of the daily work of genomics research. That attention paid will be rewarded by a public that is already accustomed to seeing and interpreting large arrays of data.
Finally, the hardest part, in my mind, is that genome visualizations must (MUST!) communicate uncertainty. This essay is not about how we represent the perfectly sorted out and fully mapped genome. It is about how we represent ongoing genome research, a lively field that must constantly correct itself. The difficulty in communicating degrees of accuracy or certainty is one of the greatest hurdles in communicating science to the public. This is not, generally, what 23andMe is trying to sell, but it is absolutely central to genomics research. Fortunately, data visualization experts have been pondering this for over a century, and many visual tools exist already.
Is it time for virtual reality gene splicing? Maybe, maybe not. It is absolutely time to treat genomics as a field of study rooted in our visual lexicon. Drawing pictures of 3 billion letters sounds like a paradox — it is this paradox that holds promise and hope for genomics research.
[This first appeared in 2017 on Medium: https://medium.com/@timisstuck/-2a6077b17313]