Webinar: NCBI Human Variation and Medical Genetics Resources

Author: NCBI

Hello, everyone. This is Peter Cooper from NCBI. This is the second webinar today. This one is going to focus on human variation in medical genetics. You should be seeing the sort of informational slide there. We’ll have a Q&A available after the webinar.

It’s going to be on the Webinars and Courses page, and that’s where the materials will be linked and that’s where the YouTube (inaudible) will be linked. Also the materials in Q&A and all those things are on the ftp site and that compressed URL will take you there, and I’ll probably visit that site at the end of the slides. I just wanted to give another shout out to the people at Des Moines University. They’re watching this from a classroom out there, and they’re the ones who arranged to have this webinar today.

Okay, so this afternoon we’re going to talk about human variations in our allied medical genetics resources. My contact information is there on the slide. You can write to me with any questions about the content or about anything else for that matter.

By the way, I didn’t mention this in the first webinar. I’ve said it several times when we do these webcasts. I’m the person who is basically in charge of the outreach program here at the Service Desk. Bonnie Maidak, who is also part of that program, is here with me to answer questions. So today I’m going to talk about variation resources. And some might call them databases. We often use that term, but it’s kind of a loose term. I want to talk about genetics resources which are distinct but overlap with those in some ways.

Webinar: NCBI Human Variation and Medical Genetics Resources

Then we’ll talk about how to get to this data using an Entre system. And really the main way we’re going to do it is from Jean and through one of the browsers. In fact, we’re going to use the variations URL, which is one of our dedicated genome browsers.

We’ll do that via live searches after the end of a few slides here. But the sort of scope of what we’re talking about today are these sort of six resources on the slide. The one in the middle dbGaP which is in gray, we won’t talk about that so much because the data that’s in that which we'll discuss in just a moment are largely controlled access data so we can’t really demo much with that, although we will see some of the analysis results. These six databases sort of span the gamut from the purely variational resources to the medical genetics resources.

And dbGaP and ClinVar sort of sit in the middle. DbSNP and dbVar are more on the end of sequence variation. And MedGen and GTR are more medical genetics realm. Over there on the side I have a box that says 1000 Genomes Project because we’re going to use that data today, and it’s an important source of data that goes into dbSNP and dbVar.

We’ll talk first about the things that I would call sequence variation databases, and in that category we’ll include ClinVar, which sort of expands across sequence variations and medical genetics. So dbSNP is our database of small scale variance. It contains submitted data and what we call reference nets, which are the non-redundant metadata, or aggregated data, for people with the same variant. DbVar is a similar concept, but it’s for large scale variance. We have variant calls and variant regions, which are sort of equivalent to the refsnp. There are no reference variants, however, for dbVar. ClinVar is a database that contains variant phenotypic assertions.

This contains phenotypic assertions about large and small scale variants, the conditions that it is associated with. Is it pathogenic, those kinds of things. These are submitted from a large number of sources. And the dbGaP, which is really the last time we’ll talk about this in any detail in the slides, is the phenotype genotype association studies that contain personally identifiable information. And so in order to access that data, you need to have permission granted by the data access committee of your university and one here at NIH which contain genome-wide association studies, medical sequencing studies, molecular diagnostic assays, and like I said, you have to have access to the individual level data. Okay, so let’s talk a little bit more about each one of these individually. DbSNP is NCBI’s small scale variation database. These are sequence variations that are less than 50 base pairs nominal.

It contains rare variants and common polymorphisms. So the name of the database is a little misleading. Single Nucleotide Polymorphism is what SNP ostensibly stands for.

Polymorphism means that it’s present in greater than one percent of the population. Many of the things that are in dbSNP are not polymorphisms in that sense, they’re rare variants. So sort of a better name for what we’re talking about here is SNVs, for the most part, Single Nucleotide Variants, although there are some simple repeats in there and some insertion-deletion variations.

There are primary records which are submitted and these derivative records which are these aggregates refsnp records. And when we look at those, we’re going to be looking mainly at the refsnps. The last count that I did there were 105 million of these human refsnp records.

The genome is basically covered with these variants. So essentially a SNP record, at its most basic, is simply a placement of a variant onto a genomic sequence or a reference sequence transcript. There are things included in the SNP record, like there’s one called Significance, which is from the submitter and from ClinVar, which is where some of the records are from. That’s been validated by dbSNP. The global allele frequency, which is shown here on this record, is coming from the 1000 Genomes Project data. This is a SNP summary for a pretty famous polymorphism that’s present in the apolipoprotein.

In one of the versions, this is a risk factor for late onset Alzheimer's disease. The other sort of straight variation database, the way that I’m talking about them today, is dbVar. This is our database of a large scale variation. Most of these things that are in here are things that we would call copy number variations, which are very prevalent in human populations.

You can see the statistics on the slide, I won’t go through those boring details of all those numbers, but there’s about three million of these copy number variations represented in this database, and there’s about 4.7 million variant regions. I’ll come back to what a variant region is in just a moment. A dbVar record is basically just the breakpoints on a reference sequence and the corresponding variations. So I didn’t make note of this at the last slide, but a useful way of writing these things down is to use something human genome variations with sliding notation. It shows you a sequence, and it tells you where on that sequence the variation is. This way, if there’s some uncertainty about where the starting points and the end points are, and this happens to be the one that I picked here, happens to be a copy number variation that’s a duplication. In fact, that’s one of the variants that are shown on this next slide, a graphical sequence viewer.

When we’re looking at the region on chromosome six, surrounding the complement area on chromosome six, complement factor B, you can see there right above my one variant region tag there, this is a highly variable region in the human genome. And we’ll actually go into this region later on today as an example. But what we have here is a variant region that has this nsv accession. There are individual calls, four of them that go underneath it, and we have these variant regions, individual calls for that particular region. So I want to pause here and sort of leave the concept of the databases for a moment just to mention a set of data that we’re going to use today that feeds into both dbVar and dbSNP. This is the 1000 Genomes Project data, which, of course, many people are well aware of that data. This has been a huge multi-center, multitasked project. The sequence at some coverage level is about 2,500 individuals from 26 different human populations around the world.

There are five so-called super populations that represent major continents. There’s about 84 – 85 million SNPs that have gone into dbSNPs. There are 60,000 structural variant regions that have gone into dbVar. And we’ll look at this data later on today in the 1000 Genomes browser. We’ll look at a common SNP present in, actually in ClinVar as well. So that brings me to talking about ClinVar.

ClinVar is a database that sort of straddles this area between the variation data and the medical genetic data. It contains variants and corresponding assertions about those variants. And, again, it’s a database that accepts the missions of both and small and large in the copy number variants, but included with those are the phenotypic assertions that go along. Is this a variant pathogenic? Is it associated with the condition? And there are accession numbers in the same sort of way that there are for dbSNP, for these RCV accession numbers that aggregate data across the same variants being reported and conditions by different people.

Then there are submissions that have SCV accessions. The purpose of ClinVar is to sort of bring all this stuff into one place, sort of standardize the way it’s represented, using the HGVS nomenclature. As you probably know if you’re worked with the literature with human variations, there’s a lot of different ways of representing the same variances, so standardize that.

And it curates information on top of dbGaP, some of the information in there is coming from dbGaP, dbSNP, and dbVar. And they also provide review status for each one of these assertions that gives you some information about how confident you can be in that particular slide. So here are some statistics. There’s just about 140,000 of these variant records. While looking at a particular variant, there may be more than one key effect associated with that variant.

Maybe more than one submitter, too. You can see the clinical significance. The breakdown here on the left-hand side of this slide. About 46,000 pathogenic variants. And then you can see there are rating systems for star ratings for the practice guideline would be the highest star rating on down to one where basically there are no assertions or criteria for those assertions to be provided. So now I want to move into the – mention some of our medical genetics resources. There are really two main ones that are at NCBI.

MedGen, which is a sort of an aggregator for human genotypes, and it provides several controllable categories of phenotypic terms as a way of sort of unifying them into these concepts. The other database at NCBI is a submission-driven database called Genetic Testing Registry, which contains submitted genetic tests. They are submitted by the labs themselves, but if you were a clinician you could go there and figure out what tests does your patient need, and how would you go about ordering that. But in the middle of the slide, it was a database that is no longer housed at NCBI, IT’S no longer served at NCBI. It’s now at its own URL, which is the Online Mendelian Inheritance In Man database. This is a literature database which has articles about human disease genes, phenotypes, and (inaudible).

So this information is also included in ClinVar and provides some of the vocabulary for MedGen through GTR. But it’s a literature-only type of database. It's still searchable here at NCBI, but if you want to see the records, you need to go to the OMIM site. So if you’re interested in human variation, how do you go about accessing the data on the web? Certainly these databases, these resources are available if you go to the NCBI home page.

There is a search box there to pull down and look. And these are all the various resources at NCBI that you can search. So ClinVar, dbGaP, dbVar, GTR, MedGen, OMIM, and SNP can all be searched in the Entrez system. We even have a way of searching all the databases at once, and this is a condition that I’m going to use as an example today. It’s called right ventricular cardiomyopathy. You’ll looking at a particular form of that.

But, if you go to this All Database search, which you can get to if you search all databases, you’ll see the Health category there which contains the databases I mentioned. And in the SNP and dbVar, the databases which are sort of more variant-based, are in that genome category. However, I just want to point out that a lot of times it’s easier to search a different database than to try to search those directly. In particular, if you’re interested in a gene, which is often the case, you might be interested in mutations or variances in a particular gene, and the way we’re going to do that today is I can get to a gene very quickly by using a gene sensor in PubMed. Here’s one associating with a form of Charcot-Marie-Tooth protein. These are very large. Aggregates the data and information.

There’s a phenotypes and variations section in each one of these records that addresses directly the kinds of things that we’re interested in. So under phenotypes, I can link the GTR in a number of different ways. There is a way of getting to something called PheGenI which I’ll mention in a minute as a separate tool that is genome association studies. Here is a link to the condition record for GTR for C-M-T disease type 1A. A link to the meds and concepts.

The link to OMIM. And the link to Genetic Testing Registry if you want to see the labs that offer tests for this. Similar to that there is a variation phase that will take you to ClinVar, dbVar, and the variation viewer, and the 1000 Genomes browser. And so the variation viewer and the 1000 Genomes browser are the two genome browsers that are dedicated to particular purposes, and we’re going to use both of them in a few minutes. Another page that sort of stands apart from the rest of NCBI, but I think it’s worth mentioning because it is very handy, this variation portal page.

To be frank, the easiest way to get there is to Google it and to search for variation in NCBI. This provides links to all those resources we talked about as well as a number of tools and the viewers that I mentioned a minute ago. And here’s sort of a slide delineating those. The variation viewer and the 1000 Genomes browser are both dedicated browsers. We have a couple of other kinds of those at NCBI. And we have on YouTube a genome browser playlist that you could take a look at that goes into some detail about how to use those. There are some other tools here. The variation reporter, which I hope to have time to demonstrate for you today, will identify if you have a set of variants or tell you if they are known.

And if they are functional, the consequences in the uploaded data. In other words, will this affect the coding region of the gene or cause a change in the amino acids. Clinical remap is a way of mapping variants under the Refseq gene records which particularly useful when the agents there are stable platforms for mapping variants onto them, and is an important part of the way ClinVar maps variants. And then PheGenI which is a browser tool for genome wide association studies as well as analyses that are present in dbGaP, and we’ll use that tool as well.

So this is sort of a general page about where to go to get some more information. The Learn Page is a place to go that has links to our Webinars and Courses page. There are a number of fact sheets available on our ftp site. Of course, the YouTube channel is there. And we have a Help Desk. Bonnie and I are both people that sit on the Help Desk, so if you have a question about anything, you can write to us there.

One thing I should mention is that when I abbreviate links in my slides, it may not be clear that that NCBI in the angle brackets, it just means that’s where you put the URL for the NCBI home page, www.ncbi.nlm.nih.gov. Likewise when I put FTP, that’s the NCBI FTP site base URL, ftp.ncbi.nlm.nih.gov. So let’s pause here and we’ll take a few questions if there are any, Bonnie. There is one, but we’re going to pause. So somebody asked about the differences between GTR and gene tests, and to be honest with you I don’t feel qualified to go through the answer to that question.

So I’m going to defer that one. And we will put that in – we have a Q&A document from this webinar, we’ll put an answer in there because I don’t feel qualified to answer that question. Is there another one, Bonnie? There’s someone who is asking about batch submission to dbSNPs versus ClinVar, and I’ve asked an NCBI staff member, Lon Phan, who is – how would you describe? Well, he’s the lead of the SNP group. He’s the lead of the SNP group, okay. And so there is information about batch submissions, but whether it’s clinically significant or normal. So I’ll send that out in the Q&A, to everyone in the Q&A. My understanding is that if you’re going to try to submit clinical information, that – I don’t know that it’s a requirement, but it would certainly be a preference if not a requirement, that it go to ClinVar. But we can verify that.

Lon says that if it’s clinically significant it needs to go to ClinVar. Okay. Okay, so let’s proceed and we’ll go through some examples that will take us through these resources. I’m going to leave the PowerPoint slides. Actually, you know, I’m not because you know what, I filled out what we’re going to do. So let’s do that. We can review together before we proceed to the web browser. So we’re going to work with a particular gene.

It’s called DSG2, desmoglein 2. This is a gene on chromosome 18. It’s part of a little family of genes. Their products are involved in cell-cell junction. And there are some horrible conditions that can arise from mutations in this gene because they affect the cardiac muscle.

What we’re going to do is we’re going to take a gene-based approach to this, so we’re going to look in gene first. Then we’re going to load the gene into the variation viewer as sort of a platform for looking at the variants. And it’s a very useful way to do that. The variation reporter, we’re going to do that as a little side trip because I have a little file of variants that I’ve made up that we can sort of see what kind of consequences they have. And also show you you can upload them into our variation viewer and indeed into any of the genome browsers or graphical sequence viewers that we have. We’ll look at some variants in DSG2, including a pathogenic copy number variant. We’ll look at a single nucleotide variant, that’s pathogenic. And from there we’ll travel to ClinVar, look at that record, and we’ll look at MedGen and GTR for the arrhythmogenic right ventricular cardiomyopathy, which is the condition associated with this particular single nucleotide variant.

Then we’ll look at a SNP that’s not a disease-causing or disease-associated variant, and that is actually present in the 1000 Genomes Project. We’ll go ahead and look at that one. Then we’ll sort of depart from DSG2 and quickly take a look at some association results which are a different idea and sort of assertions that are present in ClinVar, show you those things that are sort of linked to a condition. So, with that, let me escape from the PowerPoint. I’m going to go over here to a web browser. One of the things that I’m going to do is I’m going to need to get to a file, I’m going to open another tab over here. I go back to my PowerPoint file for a moment, and I’m going to get my compressed URL so I can get my – I have a file in here that we’re going to use. I have actually a Human Genome Variation Society text here that I want to use in a little while, so we’ll come back to that.

So what I want to do is get to the DSG2 gene. But I’m going to cheat – I actually cheated this way the last time I did a webinar today. I’m going to actually go to PubMed this time. The last time I went to, or was trying to go to Nucleotide to do this. PubMed, nucleotide and protein all have these gene centers built into them that give you handy shortcuts. I find them very convenient.

Save me a click anyway. If I search for the human gene symbol, I get a nice little ad. It gives me the articles that are linked to the gene record.

It also gives me a direct link into the human gene. I could get the other organisms if I want to. And GTR actually has an ad here, so they have some genetic tests associated with DSG2, and we’ll see some of those in a few minutes. So I’m going to go ahead and click through to the DSG2 gene. So this is a human gene record. It has lots of information on it. There’s a summary here that tells us that this gene is associated with a problem which is associated with arrhythmogenic right ventricular dysplasia familial 10, and we’ll talk more about that in just a few minutes.

You can also see it down here in the graphical sequence viewer, and if you look at the little thumbnail up here of the region around DSG2, you can see it’s part of a little family of desmic lines here. There’s another similar set of genes nearby that are called desmic columns. Real interesting gene family on the chromosome 18. Now what I’m going to do is to jump down here, use the table of contents in gene, which is a handy way to jump around. We’re going to be looking at these two sections here, the phenotypes and the variations section. So one of the things that sort of hits you in the face when you come down here to the phenotypes section is that there are professional guidelines for ACMG about this particular gene so that if there is some kind of incidental finding when they’re doing a genetic test on a person and they discover that there’s a disease-causing mutation in DSG2, that they, you know, it suggests they report that to the person because it is a potentially life-threatening condition.

And that life-threatening condition is called arrhythmogenic right ventricular cardiomyopathy type 10. We’re going to visit these in its later form, but here we have a link to the condition, page and CTR. There’s the condition and the concept in MedGen. There’s a Gene Reviews article there about this.

These are all good places to go to read more about that. What I’m going to do here, though, is leave gene by going into the variation viewer. So this variation section of the gene page has links to other databases. It also has links to two of the browsers. One of them is the variation viewer, and the other one is the 1000 Genomes browser. We’re going to look at both of those today. Notice that I can pick which human build I want to work with for the variation viewer.

The current is build 38. The 1000 Genomes data is mapped only on build 37. I’m going to click on the variation viewer link here. And so this is a sort of a standard look to – we have various kinds of genome browsers. We’ll see the 1000 Genomes browser looks pretty similar. The page is divided up into sections.

There are these little wrappers around this graphical sequence was built into the middle of it. Notice that with (inaudible) I have a search box with lots of (inaudible) searches I can do. Notice that I can upload my own data if I want to, and we’re going to do that in a moment. And if there’s any particular issues with the assembly in the region, it will let you know about that. It’s on the left-hand slide here. At the bottom there’s this table, which we’re going to come back to in a minute, that lets me filter the variants that are present in the viewer by different useful categories. Up here is a navigational aid that will let me pick the transcript that I want. If I zoom out there might be more than one gene in the region.

If I zoom in and out, I can use this region link to go back to where I was before. Right now we’re in this Gene with pad view. Notice it’s a – the cytogenetic map up here shows me basically I’m on the q12 arm chromosome 18.

So let’s take a look briefly at some of the tracks here that we have, the human gene tracks here. This is a DSG2 gene here with this exon-intron structure. This is a particular view that shows me sort of a collapsed view of the summary of the various transcripts. There are variations down here that are in ClinVar. Notice that this is color coded.

If I ever want to know what a map does, or what the colors mean, I click on this track legend link here, it will tell me what those colors mean. And so I can see that if I have a completely purple box there, that at least one SNP in that region has a pathogenic allele. If it’s green, it means that the allele is not probable pathogenic or pathogenic. So those alleles are benign, which we’ll see one of those in a little while. I see this as a track of interest if we’re interested in variations. Other kind of variations that are here are these large scale variations. And so there’s copy number variants.

Again I can see the track legend to sort of understand what these colors are trying to indicate to me. So, for example, the blue ones represent a copy number gain, and the red ones represent a copy number loss. I’ll close this legend now. Now before we go down and start filtering variants, I’d like to do just a little bit of work with that variation page that I mentioned to you, and we’ll do some analysis of some variants that I have. So I’m going to pop open a new tab over here. And I’m going to find that variation page that I mentioned.

I’m just going to type variation NCBI. Make that a little bigger. And I can just click on that, and that’s the page that I showed you in my slides a minute ago. And I have access to some tools here. Including the variation viewer, I’m already there. One of the tools that’s interesting here is the variation reporter, so I’m going to open that one.

And so I have this interface here that will let me upload some data. And notice that it knows what I’ve been doing, so it actually populates this with various kinds of data that are loaded onto one of the viewers. Primer pairs that I saw in the previous webinar. I can pick my assembly, and I can click the plus sign here to add some data. But what I can do now is I can paste some kind of mapping data in there. I’m going to use Human Genome Variation Society format for some variant calls that I have up here on the ftp site. And you can get access to those yourself. These are maps on the chromosome 18, and they’re in the region of the DSG2 gene, so I’ll copy those.

I’ll paste them into the variation reporter. I’ll go ahead and upload them. Then I’ll click Done.

So now they appear up here in my list of data. So what I’m going to do is just select that set of data, and I’m going to submit them for analysis. It will give me a nice summary of my results here. I submitted six variants.

Three of them matched the NCBI IDs. Some of them were novel meaning that we hadn’t seen that particular variance at that position. Some of them were novel alleles novel locations. So that report is shown to me here. The first one is mapped here on the graphical sequence viewer for me to look at.

It’s kind of a wide output. But it shows me what kind of changes are taking place. So these are synonymous substitutions. I have some that are non-synonymous, so they would change the protein sequence at that position. And these are ones that don’t exist currently in dbSNP.

They would be what many people would like to call “novel” ileals. Or novel SNPs. So that was a departure from where we were in the variation viewer, but I wanted to show you that tool. And I will show you the same kind of thing over here in the variation viewer. So what I want to do is we’re going to make a new track in our variation viewer over here, so I’m back over here. I’m going to go ahead and upload some data here so I can go to my data and expand that. And I can add text. Now I could have uploaded this, of course, but I already had that on my clipboard.

I can upload that. I can call it something if I want to. I’ll call it My SNPs. And actually there they are.

So this track down here shows me where the positions of my SNPs in the DSG2 gene. I can zoom in. This is a useful field to be able to use the graphical sequence viewer here. Notice I can zoom on this range by dragging out a region in the ruler at the top and zoom in on the range, so I’m zooming into those particular snps which are these two exons of DSG. Notice that I can jump to the exons here by using the axon navigator if I wanted to. It's a useful way of navigating across a particular gene. That also exists in the other viewers.

So I don’t really need that track any more. I just wanted to show you that it’s there, that there’s a way to upload your own data. So I’m just going to get rid of that by just pushing that X. And then another thing that’s handy to be able to do, so notice what I’ve done is I’ve zoomed in so we’re not seeing the entire DSG2 gene, so I’m going to change the region back to Go to Gene With Pad and that puts me back where I was. Now let’s go down here, and we’re going to filter the SNPs that we’re showing here to get some useful results.

So, for example, I could check all the ClinVar SNPs here. These are the ones that are going to have some kind of pathenogenicity or assertions about the phenotype. Check ClinVar Yes, for example.

Then I have 198 of them here. One of the things that’s very useful about this page, that’s quite a number of them. It’s not a terrible number, it’s 198.

So I might like to download that data, and I’ll just show you that first of all there’s a lot more data here than what’s shown in this top line. If I click this little arrow here, I can show all the ileals for the variants. And if I download the data, I’m going to get that for all 198 of them.

So, we’ll just go back and download this in XML for tab separated variants. So the tab separated would simply be a sort of table thing that I could load in Excel if I wanted to. XML I would need to parse that some how. Then I could also select variants based on other criteria over here. Notice by pathogenicity, what kind of variant it is, molecular consequence, minor allele frequencies from three different sets of data, and we’re going to do this in just a little bit, and also whether there are publications associated with it.

So that will work with the data that are in ClinVar. Suppose we want the ones that are dbVar here. Just to look at some of the large scale variants. I’ll click that one. Let’s reduce it to just the ones that are pathogenic. There’s 17 of them. Notice that these are fairly large insults to the genome, most of them where they affect, in some cases, hundreds of genes. They are listed for me here.

I can expand them to look at them more closely if I want to. So here’s one that we could look at. And I’m not going to do the ClinVar on this particular round. We’re going to do that with a single nucleotide variant. But here’s the ClinVar information, and that one links me to ClinVar there with some pathogenics. Notice that there are three different conditions, sort of, that are associated with this. In terms of causation, remember that this is a large variant and so it’s not necessarily the same kind of association that you might get with a small variant. I’m just going to show you what one of these looks like by going to dbVar here, just so you can get a feel for this one.

So we’re now in dbVar looking at their browser, and you can see that this is actually an enormous region. When I look down here at the gene track, you can see all the genes that are involved. This is a duplication. In fact we can expand this to see all – that there’s three different variant calls here. Let me just show you, you can change the settings for a particular track by clicking on the top of the track. And this sort shows all. So these individuals, this is coming from some kind of clinical genetic testing lab, these individuals have three copies of this region of the genome. So let’s go back and we’ll look at some of the single nucleotide variants that are in the variation viewer for this particular gene.

So I want to change my filter over here from dbVar to dbSNP. Ones that are in ClinVar, you know, look at pathogenic ones. I’m going to be interested in single nucleotide variants. And I’m going to be interested in missing variants. And there are ten of them. I want to take a look at this one here. Expand the length of that one.

This is a change that changes the transcript of position 146 from a G to an A. It shows you the rest of the effects. It changes an argentine to a histamine. And it’s associated with this condition, the one that’s bad that we’ve talked about. Any of these links will take me to ClinVar. This has got a two star review status. Criteria have been provided for their assertion, and there have been several submitters with no real conflicts. Notice that I can find this one in the viewer by clicking on the position.

This is the one we’re interested in, pathogenic. What will be interesting is to see how this affects this particular protein sequence. I can change the settings for the gene track by mousing over it and clicking on that little gear icon again like I did with the dbVar track. And I’m going to Show All. It’s going to show me all the transcripts and proteins.

And I can project the SNPs, so I can see that there are six here that affect the coding region and that we’re interested in right here. In ClinVar, it changes the arginine at that position to a histamine. And you can see that here, a little call out. Let’s follow this through over to ClinVar. And so here we have the official name for this variant in terms of how ClinVar calls it. It has a two star rating. Associated with two different conditions, but they are very closely-related conditions. One of them is the more general one.

So in thinking about the way ClinVar does things, notice that there are two different records that are part of this. And one of them has two submitters and one of them has one. So this is sort of an aggregation of all the information about this particular variant. These are the HGVS expressions that show you how this thing is mapped to the different reference sequences including the Locus Reference Genomic sequence and the corresponding gene record. And then here is information about the submitters, what the conditions are. So we can follow one of these links over here to MedGen to read more about arrhythmogenic right ventricular cardiomyopathy type 10. That’s what I’m doing here.

I’m leaving ClinVar and going to MedGen. This is an aggregate of information including an excerpt from Gene Review. You can link to the entire article there.

Next excerpt from Genetics Home Reference that you can link to there. Clinical features. A Term hierarchy from GTR, or Genetic Testing Registry.

And a link to get a test for this particular condition. These professional guidelines here, so there’s links to the ACMG recommendations. And these are various kinds of PubMed searches that get information in various categories about this particular concept or condition. What I want to do now is just a quick trip over just – so you can see what Genetic Testing Registry is like, so I’m going to click on the link here for the Clinical Test. That’s what this “C” does in the GTR column.

So if I’m a physician that’s interested in ordering this particular kind of test, I can select my condition. It’s already been checked by the clinical test. Suppose I want one that are sequence analysis of the entire coding region. I want to find – it rewinds the page when I do those. They have certified labs, they have licensed labs in the United States. And if I wanted to compare these various labs, I can click on this Compare Labs tab over here. It will give me a table that gives me the comparison of the different things that they offer.

And over here I can select the labs that I want to compare. For example, DDC and Emory Genetics. I can just show their tests and I can see what kinds of things they offer.

Okay, so I want to go back now. We’ve looked at pathogenic variants, and now let’s see if we can look at a variant that’s not pathogenic, and that’s in 1000 Genomes. Let me go over to the variation viewer. I need to bump myself back out so I’m looking at the entire gene again, go to gene with pad. So now I can filter this in different ways.

Now I can take off my pathogenic filter. I’m still interested in single nucleotide variants. I’m interested in missense variants.

And what I’m interested in is the common polymorphism, and so there are several different sequencing projects here that have data about exome sequencing. The one we’re going to be interested in in this particular case is the 1000 Genomes Project data. I’ll select the one that has a minor allele frequency of greater than five percent. So I can read about this. This is actually in ClinVar as well, and it’s specified as benign. Changes in arginine at that position to a lysine. This is a conservative kind of change.

And what I’m going to do now is visit a different database, one we haven’t visited before, and that’s dbSNP. This is a chart that will show you what’s available in dbSNP and to show you how these things are made. I’m going to scroll down so you can see that there are lots of submissions for this particular SNP, and they’re all aggregated into this one RefSNP record.

They can be instantiated as a sequence if you want, that’s the way it’s written here. But another way of doing it is simply to write the hgvs expression. It’s shown at the top here, the kind of sequence changes that this change would produce. There are actually two genes in this region, one’s on the opposite strand. It’s the non-coding one, that’s why it’s shown in two ways there. This link here takes me to this particular place in the 1000 Genomes browser, so that’s where I’m going to go for now.

You notice I’ve loaded a different browser. It’s very similar in spirit to the one that we were using a moment ago, but this has a different kind of table in it. It shows me I’m zoomed in to this particular axon on (inaudible), which of course is DSG2. And here is a variant table, or genotype table, with a different population. So over all, the minor allele frequency, the minor allele is the A. The reference genome has the G in this particular case.

Overall it’s 24%. This is kind of an interesting SNP in that it shows some population stratification. One of the things that you can notice if you glance through the populations here is that the African populations have – it’s still common SNP, but they have lower frequencies.

So, for example, the Esan in Nigeria, they have an allele frequency of 2.5%, whereas some other populations have much higher allele frequencies. Like the Chinese, Asian populations have maybe a 40% for the minor ileal. I’m going to pick sort of a middle-of-the-road population and one that is well studied, and that’s the Utah residents. These are in the super population of European ancestry people. I’m going to expand this so I can look at the samples for the individual people. And a person who is used, or whose sample is used, in a lot of studies has the identifying number NA12878. It’s a half-Basque individual. She’s a woman.

And I can check this box here. Notice that she is a heterozygote. I can actually load the next gen alignment of the exome reads for her.

The reads in that position do identify her as a heterozygote. So I’m going to click that box. And you can see the sort of mound of stuff here that represents the alignments of those exome reads. What I can do here is click on the marker and zoom to the sequence of that marker. And you can see data that indicates why they thought that she was heterozygote.

The reads aligned the genome, and you can see the actually variants at that position. The last example, I’m going to go back to the variation page. We’re going to switch gears altogether and we’re going to use PheGeni. We’ll get some association results. So these are snps that are linked to a particular condition. They’re not necessarily causative, but are somehow linked, perhaps on the same haplotype. This data comes from mainly the NHGRI's GWAS catalog, which is a literature database. Also some analyses that have been run in dbGaP.

So I’m back at the variation page. I’ll click on the phenotype genotype integrator. And notice that I can type a number of different conditions. These are mesh terms for various conditions.

The one that I’m going to pick here is asthma. It’s an inflammatory problem that people have with their respiratory tract of course. I can pick what my genome wide P value is here. I’ll pick it arbitrarily as ten to the minus 8, which is a pretty decent one.

Then I can go ahead and run this across all the different data that are here. We’re going to focus on the association results. I have 33 of them. So I have a bunch of different SNPs listed here, from a bunch of different sources.

The ones that have phs, these are accessions from dbGaP. The ones that are the NHGRI source, these are from the Genome Wide Association Study’s catalog. These are ranks formed by P value. Notice that the most significant P value, and it’s quite significant, is for the gene notch four. Notch four is not something that you would think of as being involved directly in asthma.

It’s not involved in inflammation directly in any way. One of the things that’s interesting to look at here is we can sort these by location, and that gives us some additional information. So I’m going to sort this by where they are in the genome rather than by their P value. You’ll notice that we have sort of a really large set of SNPs that map to a particular region on chromosome six.

You’ll recognize the names of some of these HLA genes here. So this is the HLA region on chromosome six. Also contains some components of the complement cascade. This is a pretty likely region because these are genes that are involved in inflammatory response. And so this SNP could conceivably be linked to a haplotype that’s associated with asthma.

Let’s go ahead and link through to the notch four SNP, our second foray into dbSNP. I want to show you a track that’s kind of interesting in this regard, and that’s they’ve mapped association results onto genome. So here is this association result with asthma. And actually this graph here tells you what the P value is. Let me show you something that I think is kind of interesting. This region is covered with associations. I’ll kind of zoom out. And this is using the zoom graphics on the graphical sequencer to do that.

If you mouse over a lot of these, you’ll notice that they are associated with things that are involved in central immune system dysfunctions, psoriasis, lupus, asthma, Multiple Sclerosis. And you could continue to zoom out if you wanted to. You’d see that there are a lot of genes in this region, chromosome six, that are linked to those kinds of immune system dysfunction.

And one of the things I also just want to point out in passing while we’re here. This happens to be a highly variable region of the human genome, so we have had to generated – we have generated – GRC has generated – a lot of alternative loci to represent different versions of this region, various human genomes. Okay, so those are all the things that I wanted to show you. Probably too many things.

But let’s stop here, and we’ll keep it open to answer questions. If you need to go, please – thanks for coming, but you can go, but I’ll answer any questions people have right now. And we have until 4:00 scheduled? No, we have until – well, yes, it’s supposed to end at 3:50. Okay. But I can stay longer if you can stay longer. I can stay longer. So one question is, can I intercompare the SNPs and indels in – and then he lists three different NA individuals, and HG19 using variation viewer? So the NA12880 as an example, and other individuals.

And I know you can bring up the individual NA tracks in the 1000 Genomes browser, but I wasn’t sure if that would automatically include the HG19 data. So HG – remind me – the numbers confuse me sometimes. HG19 what – is that 37 or is that 38 – I’m not sure. I’m not sure which one it is. The 1000 Genomes data have been mapped on – the browser they’re mapped onto 37. They haven’t loaded the 38. Okay, so somebody has Chatted to us that it’s 37.

37 data are there in the 1000 Genomes project. I didn’t point this out, the variation viewer also allows you to – the variation viewer you can pick your assembly. So right now we’re looking at 38 because that’s the one I picked.

You can change that to 37. There’s a pull down list right here, so you can change it to 37. Okay. Another question.

Is the mode of inheritance reported anywhere in ClinVar rather than in OMIM? So the mode of inheritance is not always reported but it is reported – I’m not sure what they mean by other than in OMIM. Well, OMIM definitely has the mode of inheritance. The question then is, does ClinVar display the mode of inheritance? And I know, because I looked it up in the ClinVar fact sheet, that it’s encouraged to be part of your data submission. But I’m just not familiar with where on the ClinVar pages it would show the mode of inheritance. It would show it in that table that I showed you at the bottom of the record I had a minute ago. I’d have to go back to that record.

I think it’s shown somewhere here. Yeah, maybe not. Okay. I’m going to bring up two – By mode of inheritance do they mean like – autosomal recessive? Yeah. So in this case it’s shown for the Conditions, where it says mode of inheritance. In this case it does have it in – and so it will have it in the conditions and then the mode of inheritance in parentheses. So the question, I guess, is where is that coming from, and I don’t know.

It may be coming from vocabulary, taken from OMIM, I’m not positive about that. Or if it was part of the data submission. It could have been part of the data submission. Okay.

So one more question is whether when you download the data from ClinVar, if the results could be for GRCH37 when the downloaded results always seem to default to CRCH38. And if you can’t answer that, Peter, I have another one that I wanted to just bring up so the listeners knew that these were the asked questions. And that is whether VarView is synchronized with ClinVar so that if the clinical significances change – In the variation viewer? Yeah, the variation viewer. Those tracks should be updated, yes. So if a clinical significance data changes from uncertain to pathogenic, how quickly will that be reflected in the variation viewer? I don’t know how quickly that would be done. Okay. And for all of these questions that we’re not able to answer today, we will definitely check with our colleagues and add the information in the Q&A document that we will be posting after the webinar.

That’s right. Now one other question that was previous to this one was about downloading ClinVar data. So I’m at the ClinVar FPP site here. And you can get the vcf file mapped onto 37 from this place.

I don’t know if that’s what the person was asking about or not. But certainly those data are on both the current build and on the previous build. And in fact we must have it because that’s why you can show the variation viewer with the two builds off this. So Des Moines University, which is sort of what this webinar is targeted for, Vanessa asked me if we could stay around and answer questions from them, too. I don’t know if we’re hearing from them or not. I don’t think so. I mean, I’m definitely not seeing any questions from Vanessa herself. Okay.

Hello, Dr. Cooper this is Becky at Des Moines University, and we do have a few questions. Okay. We just wanted to wait until an appropriate time for you if that’s all right. Okay. We have one question here from one of our faculty. Okay, sure.

Hi. The question is, if you are in ClinVar and you see a particular SNP that’s pathogenic and you see that it’s a missense mutation, for example, is there a simple way to move from this information to approaching structure and where that amino acid might be located? Yeah. Maybe I’ll rephrase – a straightforward way? Well, I guess it depends on what you mean by that. So there is a mechanism built into our system. But it depends – what it depends mainly on is the availability of the structures for that particular protein. So in principle you can go to the reference sequence. You could – you know where the variant is on the reference sequence.

Let me get a sequence in front of me, okay? I’d like to be there. I want to be here. So what I’m going to do is retrieve a particular protein sequence. So I'm going to retreive - this is an MLH1 protein, which contains snps in ClinVar. So if you have a protein like this, so this would be the reagent, the protein upon which those SNPs are mapped. And then what you have to do is go down here to a link to structure that’s kind of an obscure one, but it’s still a straightforward one. And that’s the thing that says Related Structures Summary. So what it does is it shows you an alignment between – let me make this bigger – it shows you an alignment between the sequence that you started with and a corresponding structure.

Now this is set up to show you sort of a low redundancy view. I’m going to change it to All Similar MMDV because there is a structure for the human protein, at least part of it. So here’s an example of – so in the case of this particular protein, the only structures are for the conserve domains at the N-terminal end. So if you knew, in this particular case, this structure starts with the N-terminal methionine, the main problem is that you have to go from the coordinate system of the protein to the coordinate system of the structure, but this would be very straightforward. So you could just launch that structure.

You know where the amino acids are. I think that answers your question. So you’re mapping it, not doing any kind of prediction, you’re just saying this is where it is in the structure, did it hit, you know, an active site, did it hit some kind of critical residue, and I can tell from the structure which residue. Does that answer your question? Yes. Thank you. Okay. Dr. Cooper, that concludes our questions here at Des Moines University.

Thank you for taking that additional question at the end. Okay. Great. Thank you. Somebody asked a question in the Chat box, is there a way to move from the ClinVar graphics view to Primer-BLAST? Yeah, there definitely is a way to do that, and I think I understand the question. So the question is, you can take any place that you have in the graphical sequence viewer, and you can send it to the primer blast.

This is not a particularly wonderful place to show that from, but I’ll do it anyway. This is the 1000 Genomes browser. But I can take any region that I want to here, and I can just send that directly to primer blast.

So I think that’s what you mean by the ClinVar graphical view. So any graphical sequence viewer implementation that exists, whether it’s imbedded into one of these genome browsers or whether it’s just simply on a gene page, you can take whatever region you select and just throw it right into the primer blast. And then when you get your results back, you can even load the SNP track into primer blast to see where your SNPs are relative to where your primers lie.

Okay. I think that answered the question. Okay. Let’s go ahead, and I’m going to end the session for everybody and thank you very much for coming.

General Extraction in Charlotte & Pineville NC:Dr. Marashi | Greater Charlotte…

Hello, I’m Dr. Amir Marashi of Greater Charlotte Oral & Facial Surgery. I’m a board-certified oral and maxillofacial surgeon, and have both a dental degree and a medical degree. When…

By: Greater Charlotte Oral & Facial Surgery, Dental Implants & Wisdom Teeth

Scoliosis Scoliosis is a medical condition in which a person's spine is curved from side to side. Although it is a complex three-dimensional deformity, on an X-ray, viewed from…

By: Encyclopediacc
Hammer toe

A hammer toe or contracted toe is a deformity of the proximal interphalangeal joint of the second, third, or fourth toe causing it to be permanently bent, resembling a hammer. Mallet…

By: Audiopedia
Schwann cell

Schwann cells or neurolemmocytes are the principle glia of the peripheral nervous system. Glial cells function to support neurons and, in the PNS, also include satellite cells, olfactory…

By: Audiopedia
Webinar: NCBI Human Variation and Medical Genetics Resources

Hello, everyone. This is Peter Cooper from NCBI. This is the second webinar today. This one is going to focus on human variation in medical genetics. You should be seeing the sort of…