Wednesday, November 27, 2013

On Not Dying Young: Fatal Illness or Flawed Algorithm?

Lukas F. Hartman, in a November 26, 2013 posting titled "Why 23andMe has the FDA worried: It wrongly told me I might die young," demonstrates the need for skepticism and oversight of many algorithmic-based decision models.


23andMe is one of many companies to offer at-home genetic testing; in September it reported that its database had reached 400,000 people. Scientists have raised questions about the accuracy of the tests, and in May 2011 a Dutch study claimed the tests were inaccurate and offered little to no benefit to consumers. 23andMe’s $99 Saliva Collection Kit and Personal Genome Service (PGS) claims to test saliva, to provide data that shows users how their genetics may impact their health and explores their personal ancestry. The company is backed by Google.

The US Food and Drug Administration (FDA) recently ordered 23andMe to “immediately discontinue” the marketing of a genetic screening service, after the company failed to send the agency information that supports its marketing claims. “FDA is concerned about the public health consequences of inaccurate results from the PGS device; the main purpose of compliance with FDA’s regulatory requirements is to ensure that the tests work,” Gutierrez wrote in the letter, which was dated 22 November 2013 and addressed to 23andMe co-founder Anne Wojcicki.

An Unwelcome Surprise

Mr. Hartman signed up for 23andme in November 2010. He sent them his saliva and received a web login to his genome in return. 


23andMe extract a sort of gene soup from a person's saliva and pour it on a DNA microarray chip made by a company called Illumina. These chips are covered with thousands of little testing probes. A probe is made up of a lump of molecules to which the matching pieces of my DNA naturally attach. These molecules are designed so that they light up when a match occurs. Hundreds of thousands of chemical tests run in parallel on the chip. The result is an image that is scanned by a computer and compared to a database of so called SNPs, “snips.” According to Wikipedia, these “single nucleotide polymorphisms” make up about 90% of all genetic variation in the human genome. So when 23andMe detects a SNP variation in a person's genome it means that in a base pair of that person's DNA there is a difference from the so-called “reference genome.” 

To sum it up, 23andMe compares hundreds of thousands of scanned SNPs to its database which is constantly updated in response to new scientific studies and sources. The website then shows you nicely designed, ready to ingest interpretations of your genetic variations manifesting in health risks. Every time they have new updates for “Health Risks” or “Inherited Conditions,” you’ll receive an email.

Everything went well for a long time. There were no special surprises. But some weeks ago there was, suddenly, an unnerving update in Mr. Hartman's inherited conditions report. He clicked the link and a warning appeared. You have to specifically agree if you want to know the result of potentially unnerving, life changing results. He clicked OK and was forwarded to the result. It said:
Has two mutations linked to limb-girdle muscular dystrophy. A person with two of these mutations typically has limb-girdle muscular dystrophy.
Mr. Hartman let that sink in for a moment. He had never heard of this illness before. Some people with limb-girdle muscular dystrophy lose the ability to walk and suffer from serious disability,” said the page, showing Mr. Hartman an image of a smiling physical therapist treating a smiling patient. What 23andMe didn’t spell out—but Wikipedia did—was that LGMD potentially ends with death. 

Coding Error or Genetic Condition?

Mr. Hartman downloaded my 23andMe data and poked at it with a text editor. He read cryptic articles about genetic engineering and installed a genome analysis tool, “Promethease,” which can import, amongst other formats, 23andMe raw data; but in contrast to 23andMe it tells you even the very unnerving stuff. Someone had found a bug in Mr. Hartman and he tried to reproduce it.

Technically speaking, 23andMe detected two SNP variations in Mr. Hartman's genome called rs28933693 and rs28937900. So he attempted finding out more about these mutations. When you look up “rs28933693” in SNPedia, a kind of Wikipedia for SNPs, you’ll find a link to an entry in OMIM (Online Mendelian Inheritance in Man). The entry features medical study excerpts concerning some LGMD patients that all had the same so called homozygous mutation in a certain gene location.

To understand the meaning of this you have to recall that humans are diploid organisms: We have two copies of each chromosome, one inherited from the mother, another from the father. A heterozygous mutation only affects one of the two copies, a homozygous mutation means that the same location of both copies differs in the same way.

Diploid is a good thing; it means that we potentially have a backup of every critical function of our body. So if a piece of my DNA encodes a critical enzyme and this code is “broken” on one of the chromosome copies, it could well be intact on the other. If you’re out of luck and both of your parents are “carriers” of exactly the same mutation, the inherited condition may manifest in you. This was the case with the LGMD patients mentioned in the study Mr. Hartman stumbled upon. Both of their copies of the respective chromosome region are mutated in the same (homozygous) way, which triggers the muscular dystrophy. This very rarely happens, but it happens.

After researching tensely for some hours, Mr. Hartman looked closer into the data that 23andMe provided as a download. Yes, he really had two mutations. But they weren’t on the same gene, but on two different genes. By rare chance, both of these mutations are statistically linked to LGMD, but to two different versions of LGMD. So he didn’t have a homozygous mutation, but two unrelated heterozygous ones. The web programmers at 23andme had added those two mutations together into one homozygous mutation in their code. And so the algorithm switched to red alert.

Mr. Hartman sent a support request to 23andMe including his research and conclusions (this would be called a “bug report” in software engineering). After a few days of waiting, 23andMe confirmed the bug and apologized. So the bug was not inside of of Mr. Hartman, but in the algorithm. An algorithm can be fixed easily, unlike someone's genetic code.

False Positives, False Negatives and the Risks of Automation Bias

Human judgment is subject to an automation bias which, as discussed in a 2010 law review articlefosters a tendency to disregard or not search for contradictory information insight of a computer-generated solution that is accepted as correct. Such bias has been found to be most pronounced when computer technology fails to flag a problem.

In a study from the medical context, researchers compared the diagnostic accuracy of two groups of experienced mammogram readers (radiologists, radiographers, and breast clinicians)—one aided by a Computer Aided Detection (CAD) program and the other lacking access to the technology. The study revealed that the first group was almost twice as likely to miss signs of cancer if the CAD did not flag the concerning presentation than the second group that did not rely on the program.

The false positive for limb-girdle muscular dystrophy 23andMe emailed Mr. Hartman is clearly problematic, but the risk of a false negative when combined with automation bias is potentially catastrophic. 

BRCA1 Gene
Consider, for illustrative purposes, coding errors resulting in false negatives for the breast cancer 1, early onset (BRCA1) gene. BRCA1 is part of a complex that repairs double-strand breaks in DNA. The strands of the DNA double helix are continuously breaking from damage. Sometimes one strand is broken, and sometimes both strands are broken simultaneously. BRCA1 is part of a protein complex that repairs DNA when both strands are broken.  

Researchers have identified more than 1,000 mutations in the BRCA1 gene, many of which are associated with an increased risk of cancer. Researchers believe that the defective BRCA1 protein is unable to help fix DNA damages leading to mutations in other genes. These mutations can accumulate and may allow cells to grow and divide uncontrollably to form a tumor.  

Women having inherited a defective BRCA1 gene have risks for breast and ovarian cancer that are so high and seem so selective that many woman with BRCA1 mutations choose to have prophylactic surgeryWhy? Bilateral prophylactic mastectomy has been shown to reduce the risk of breast cancer by at least 95 percent in women who have a mutation in the BRCA1 gene. A woman receiving a false negative for a BRCA1 mutation would not consider prophylactic surgery. Why should she, she has no BRCA1 mutation?

A false negative creates a false sense of security and restricts a woman's right to choose. To choose whether to have prophylactic surgery; to choose to have more intense monitoring; to choose alternative therapies; to choose life.

* * * * *

The potential and pitfalls of an increasingly algorithmic world beg the question of whether legal and policy changes are needed to regulate our changing environment. Should we regulate, or further regulate, algorithms in certain contexts? What would such regulation look like? Is it even possible? What ill effects might regulation itself cause? Given the ubiquity of algorithms, do they, in a sense, regulate us?



No comments:

Post a Comment

Because I value your thoughtful opinions, I encourage you to add a comment to this discussion. Don't be offended if I edit your comments for clarity or to keep out questionable matters, however, and I may even delete off-topic comments.