In Jonathan Coulton's song "That Spells DNA" (lyrics here), he gives some sequences:

If it says TGGTCGAAC
Then you might get the cancer
Then you shouldn’t eat shrimp or nuts
Then you’ll probably wish that you didn’t know

In all likelihood, these are just random sequences... but I can't help wondering if these are actual sequences from the human genome containing SNPs related to cancer, shrimp/nut allergies, and Huntington's Disease. (I suggest Huntington's because it's the textbook example of an awful disease with a relatively simple, accurate genetic test, and you hear so much about people voluntarily not getting tested for it.)

I'm new to bioinformatics and have no idea where to start answering this question. What database(s) would be good to search? These are very short sequences, but in combination with the suggestions for related diseases, is the search at all feasible?


These are random sequences and will exist in the human genome very many times. Just out of interest I checked the longest (bottom) one and more than 2000 perfect matches come up on almost every chromosome.

As to Huntington's, the disease is caused by a polyglutamine repeat expansion so one would expect to see 50+ copies of the same CAG codon in the coding region of the Huntington gene which obviously none of the above sequences have.