go to content

Home | Community | Customer Profiles | Andrew Sharp, Ph.D.

Andrew Sharp, Ph.D.

Dept. of Genetics and Genomic Sciences  |  Mount Sinai School of Medicine

Long-Imagined Experiment Finally Possible – A Comprehensive Study of CNVs Across Complex Regions in the Human Genome

Dr. Andrew Sharp has focused his scientific career on studying structural variation, including copy number variants, in the human genome. For years, Dr. Sharp found himself in a frustrating situation – he had an interesting idea for a research project but he didn’t have the right tools to implement it.

“I was interested in looking at all the genes that are present in the human genome at multiple copies, including those in tandem repeat regions. But there was always a technical problem: the technologies we had, microarrays, didn’t work very well. So, I never embarked on these kind of large scale studies because I thought, this just isn’t going to work.”

Microarrays don’t work well for analyzing regions with high variability in gene copy numbers, explains Dr. Sharp, because “after 5-10 copies the response is not at all linear. When you get up to very high copy numbers you get very large differences in intensity and the camera saturates the signal.”

“I needed a cost effective technology would allow me to look at a few hundred genes at a time in one assay and give me an accurate answer. This is where NanoString came in,” he says.

When he learned about the nCounter CNV CodeSets, he began to immediately plan his research project. His team first did a computational analysis of the human genome to identify which genes were present in multiple copies. They found approximately 200 with two or more copies, with some genes present in up to 50 copies. They then designed a custom nCounter CodeSet with probes tied to each of these multi-copy genes. They used 165 HapMap samples to screen against normal human populations and used DNA from different primate species – chimp, macaque, gorilla, and gibbon – to assess how these genes may have changed during primate evolution.

The team found that about two-thirds of the genes they were studying were present at two or more copies in the reference genome and were highly variable between individuals. “In the most extreme cases we found a gene that could be present at as many as 250 copies. For genes we knew from other data were highly variable we used multiple nCounter probes and had consistent results with the replica probes. We compared the results using other technologies, for example, Southern blots and qPCR, and concordance was excellent.”

The gene that tested at the highest number – REXO1L2P, is represented in the human genome reference assembly only in three copies, but it sits adjacent to a large assembly gap. “The fact that it’s next to a gap is probably because this region is so highly variable. This was a common theme in our analysis: approximately one-third of these multi-copy genes have assembly gaps right next to them,” says Dr. Sharp.

When they compared the human results with primate data they found even more variation. REXO1L2P again provided an extreme case of variation with one of the gorillas tested having 1000 copies of the gene. “While it’s probably doing something interesting, it’s obviously not causing anything serious. But at this point we really don’t know what it does,” said Dr. Sharp.

The differences they found between humans and primates in many genes with copy number variation and the fact that these genes tend to be involved in immunity, antibacterial, antiviral activities suggest that these genes are probably important evolutionarily, says Dr. Sharp. Also, in comparison to normal unique genes, they found a rapid rate in change at the amino acid level, “suggesting some selective pressure that’s making this happen,” he said. “Obviously if you have gain or loss of whole genes among certain people these genes are probably doing something important.”

Dr. Sharp has shown with his nCounter CNV experiments that SNP typing with genome-wide association studies misses a significant source of variation, specifically multi-copy genes in tandem repeats. “What we know about tandem repeats is that they are very variable even from generation to generation, compared to SNPs which are very stable. To demonstrate this we compared our gene set with copy number variation with nearby known SNPs and asked – ‘if we take a SNP marker that is near a multi-copy gene, does genotyping alone tell us anything about how many copies of that gene we’ll have?’ For the genes we were looking at the answer is no.”

As a next step, Dr. Sharp’s team is now interested in looking at how large tandem repeats may be a novel mechanism for gene regulation, and in taking some of these genes that they’ve identified as highly variable and doing specific disease association studies.

“For me one of the things that’s really nice about the NanoString technology is that it’s just so simple - you have your DNA sample, you digest it, you mix it with the probes, you put it on the machine and an Excel file spits out,” says Dr. Sharp. “In a week you can screen hundreds of individuals and the data that you get out of the machine is a file that takes you about 15 minutes to analyze. It’s pretty rare that you get a technology that is that user friendly.