20 July 2013

Third Party DNA tools: Gedmatch.com

I feel lazy. I haven't written in this blog in a month and I feel terribly lazy. However, I've been terribly busy. Why? Well all these wonderful DNA matches keep rolling in! I received my AncestryDNA results for my maternal 1st cousin and have been putting his results into a spreadsheet with my results to see where we *don't* overlap so I can focus on cousin matches that are probably exclusive to my father (my parents do share an ancestor, so there is a lot of overlap!). It doesn't help that Ancestry doesn't have any way to look at the chromosome data so I can see where I match my matches. It doesn't help that I can't look at how my matches match others to see if there is a pattern. And it really doesn't help that I can't easily compare how many matches my cousin and I have in common.

Enter Gedmatch.com!

Gedmatch.com is a volunteer-run free website that allows you to upload your raw autosomal DNA data from AncestryDNA, 23andMe, and FTDNA to use their host of tools. I will start off with the warning that they are so backlogged with new uploads that they aren't accepting any new ones until mid-August (currently the 15th). But I've put off this post for so long, that I think it's best to just let you know about all the features and let you get ready. So what does Gedmatch do that's so special? Why will you be sitting on the edge of your chair waiting for August 15th?

One to Many
The first option for analyzing your data is the "one-to-many". This option allows you to compare your data with the entirety of Gedmatch's database. Now, I'm going to take a moment to mention IBS and IBD. Inherited by Descent (IBD) is the term used when your DNA matches someone else, because you've inherited the DNA via a common ancestor. Inherited by State (IBS) is when your DNA shows a match, but is most likely due to just random jumbling of DNA that looks like someone else's DNA. How do you know if your match is IBS or IBD? General rule of thumb in genetic genealogy: a DNA segment must be at least 7cM (centiMorgans) long in order to be IBD. Now, 7 cM is terribly small. To put it into perspective: a sibling will have 2350cM in common with you. A 1st cousin will have 800 cM, 2nd 200....by 5th cousins, there's only 25cM in common! So a segment match of 7 is going to be way back there. If you're looking for close relations, you'll want to look at larger segments.

Why is that all important? When you go to the "one-to-many" tab, you'll be asked what you want the minimum length of autosomal DNA segments to be for your matches. It's default setting is 7cM. Go no lower. A second length option you have to set is for the X chromosome. Now, there is value in an X chromosome match. If you match a male on his X chromosome, then your focus should be the maternal side of his tree. The default value for the X is 3cM, however. This tool will pull up any matches of autosomal DNA of 7cM OR 3cM on the X chromosome. So you could end up looking at a lot of matches that are high on the X chromosome, but below the threshold on the autosomal DNA. Does that mean you aren't related to them? Not entirely. The problem is that the X chromosome isn't gender specific. Yes a male has only one X and gets it from his mother, but his mother got one from her father as well as her mother. So the variety of ancestry makes the X unreliable on it's own (for right now). When you set up your defaults, set the X option higher (30 or more) to filter out any matches below the autosomal threshold. You can always look at those matches later, but your first foray will be less confusing if you leave those bits off. There is also a check box for cross referencing. I won't get into what it is, because it's unreliable. Just don't check that box.

This is the one to one of my best match and myself.
I would focus on the chromosome 2 segment as it
meets the 7 & 700 rule.
Your results will come in a table. You can hide the autosomal column or the X column if you want to focus on the numbers in one or the other. You'll see the kit number for the match (I'll explain that later), their email if they made it public, and your estimated generations before common ancestor. There is an "L" that is a link. You click on that and you see a list of kits that person matches. You can use that to compare matches you have in common, which could help you figure out what side they match with you. The autosomal and X columns have an "A" and an "X", respectively. You click on those and it shows you the specific segments you have in common with that person. Remember, 7cM is the minimum for IBD. But that's not the only number you need to have in your head. The other is the number of SNPs (single nucleotide polymorphisms) in that segment: 700. I try to remember it as the 7&7 rule. If I find a segment that is 7cM long, I then check if it's 700 or better on SNPs. Often, you'll match a person on one chromosome and meet these parameters and then match them on other chromosomes under the minimum. Those other matches are IBS matches and should be ignored when figuring out all that is here. The best part of this table is that you can select multiple kits and compare them all at the same time. I have 6 matches that all have the same email (uploaded by the same person for different family members). I checked each one and then did a comparison. It was obvious from the segment lengths that two were siblings, one was a parent, one was a cousin and the other two were children of one or the other sibling. A message confirmed this assumption and matching all of them on a specific chromosome helped us to figure out how we connected to each other and other tests that we matched on that chromosome segment. Since the parent was their father, and I matched heavily on his X chromosome as well as the autosomal, then we were able to work on his maternal line to find the match.

One to One, X One to One, Phasing, One or Both, and Specific Segment
The next options on the home screen are "one-to-one" and "X one-to-one" comparisons. These are similar to the results you get when you click those "A" or "X" links on the "one-to-many" results, but you can better control the minimums to fine tune the comparison. To be honest, the only time I use these is when someone in a discussion group tells me their kit number and I check to see if we are matched. Same rules apply: 7 & 700. "One or Both" is the option if you want to find matches for more than one kit. When I finally am able to upload my cousin's raw data, this option will help me find the kits that match us both much quicker than running separate reports and comparing names myself.

Phasing is an option I haven't explored as I'm not a parent, nor has either of my parents taken the test. What phasing does is take one or both parents' kits and compare them to their child's kit to figure out what DNA is mom's and what is dad's. It takes a while for the results to come back (about 3-5 days).

And then there is the "Specific Segment" option. You can put in the chromosome number and the segment start and end points, with the thresholds for minimum length allowed, and a report of kits matching you specifically on that segment come up. I've tried it with the chromosome segments that my closest matches have in common with me, but haven't found anyone with a significant match on that segment other than them. Ah well.

Admixture


Admixture is the fancy name for ethnicity. When you took your autosomal test, the company you chose used their own reference populations and algorithms to calculate your possible ethnicity mix. Now, as I've pointed out before, there's a lot of guess work involved. Let me illustrate the point for you:

The results page for each of these tests has a link to the blog or development data. I recommend viewing those links to get information and details straight from the horse's mouth. Sometimes, if not too overworked, the developer can answer a question; or someone else has already asked your question and received an answer!

These four pie charts show my ethnicities according to four different calculators: MDLP, Dodecad, Eurogenes, and Harappa. You'll note that they don't use the same names for similar areas and that even when they do, the percentages can vary wildly. Eurogenes has a chart of where the populations they use as a reference are from. Does that mean that if I match them, that my family is from there? Not really. It's just a way of defining the area. Just because the "South Baltic" ethnicity uses Lithuanians to define it, doesn't mean my family is from Lithuania. Now, I have records that indicate they are, but because of migrations and border changes, they could be Polish or Russian.... all this calculator tells me is that my DNA has data in common with the Lithuanians that were tested.

I used the K12 for each (save Harappa) to give a fair comparison. That means there were 12 possible categories to have represented in my DNA. Every calculator save Harappa has multiple options as far as how refined the results will be. Eurogenes will go up to 36 categories. Well that means we'll get better results, right? Not really. At some point, you can take your DNA to such a small amount that what you match isn't really proving your deep ancestry as much as reporting noise. Just like the matching needs to follow the 7&7 rule, ethnicity works best at 5% and above. Most people, including the developers of these tools, will tell you that K12 is your fairest bet.

And then there's the test to find Ashkenazi DNA............

This is a comparison of the EU Test and the J Test options. It is recommended by the developer that you compare both as Ashkenazi can show a false positive. My supposed Jewish DNA is very small, so it's more likely than not that it's just noise. I explained all of this to my mother, but she's been convinced for years that we're Jewish, so she clung to this like the last chopper out of Nam and has told everyone how her smart daughter found the proof. Oy vey.

Eye Color and Rare SNP
In the end, I guess what I'm saying is that the admixture tools should be used for fun, not actual science. Maybe one day, but there's no definitive proof that's going to give us the answers we're looking for without a large margin of error. And speaking of "fun" tools with large margins of error.....
I'm not sure, but I don't think they're blue...

There is an option to see what your eye color is. I know, a mirror works just as well, be quiet and listen. This tool looks at the multiple mutations required to make your specific eye color. If you recall my earlier posts, it's not just one gene that decides if you have blue eyes, it can be dozens. So I checked my eye color and it says they are blue/grey. There's some markers for brown and I'm supposed to have golden irises or some such thing, but the user submitted photo of an eye that matches my DNA is bluish grey. In case you've not seen me before, I have very dark brown eyes. They've lightened a bit as I've aged, but when I was a child they were almost black. I can say that I've now studied my eyes more than I've ever done before and I do see some golden flecks near the pupils. And there seems to be some bluish tones that could lead some to say I have hazel eyes. So maybe as I get older, my eyes will start to go grey like my hair? Why do I find that so cool?!

The other supposedly fun tool is the Rare SNP calculator. This tool looks for SNPs that aren't found in the majority of the people in the database. Why would you want to know about these rare SNPs? Well, if you have something truly uncommon, it could help you connect to others who are part of that small group of people. They would be very likely related to you in order to have received that same rare mutation. There is a warning, however, that you need to take super serious: these rare SNPs can impart medical information! The medical information is based on studies (whether or not they are good studies is up for debate) that claim to connect that mutation with a disease or resistance or increased risk (or decreased risk). Please, don't use this utility as a replacement for talking to a medical professional. First, it's set up very confusing. Second, just because you carry a gene doesn't mean that gene is actively working against you. If you want to know your medical genetic information, see a genetic counselor. I need to also warn you that there is a new option to join the SNP pool. This allows you to compare your rare SNPs directly with others who have joined the pool. You will be able to see their medical information based on their DNA and they will see yours. I see no genealogical value in this tool at this point, so I actually don't recommend using it. It's not worth working yourself up over every cancer gene you have. And it's certainly not worth letting strangers work themselves up over every cancer gene you have.
So welcome to the information overload. Now you know what's been taking me so long! I've got one more utility I want to cover and then I'm thinking a wrap up on this series (I'm getting a bit tired of DNA to tell you the truth). I know I've not covered all you'll need to know about Gedmatch.com, but I have three last points:

1. I've started a discussion group on Facebook for those looking to compare notes and ask questions:
Gedmatch. Discussion Group. Even though Gedmatch is in the title, we'll discuss anything related to genetic genealogy. We've got some files and explanations already up, but no question is discouraged.

2. When you upload your raw data, you'll be given a kit number. It's usually a letter that indicates what company you took your test with and a series of numbers. You need that number to run the tests. You can share the number on discussion groups, but then anyone can input your number and see your matches, so it's up to you. Since this is true of the people who match you, note that uploading to this site allows them to see your matches and compare you to others, but only you will be able to use your rare SNP utility (unless you join the pool). Also, no one else can access your raw data. With that said, this is not for those who want to keep their DNA private. Yes, you don't have to give contact information. But I can take your anonymous kit number and look at your matches. With gumption and time, I could use that information to identify you via family trees and records. It's not easy, so it's not a large worry, but it's something those worried about privacy should consider.

3. This is a volunteer site. They don't charge a membership fee. When AncestryDNA finally allowed raw results to be downloaded, this site crashed twice from the number of people uploading their results. Now it's backlogged and runs the risk of crashing again when it allows new uploads. PLEASE take a moment to donate to the site via Paypal or the snail mail address listed on the site. You don't have to pay, but it's the right thing to do. This site makes using third party genetic tools easier for everyone. They give you a place to compare your DNA to people who took tests from other companies. They ask for nothing. They deserve support. Your money makes sure they can keep the site running and buy the server space necessary to give you as many matches as possible. Our continued support will mean improvements to the site as well! Even if all you can give is a little, give it. Give it twice. When you have a little extra cash and think about all those wonderful new matches you've discovered, give that money to show how grateful you are for this opportunity. Encourage others to give. Donate now so they aren't forced to make it a paid site to keep the lights on! (Don't think it can't happen).

Alright, I'm done for now. Up next is a second third party tool you may enjoy and it's only $5!
-Ana

19 comments:

  1. There's some good info here. I just got my genome the other day from 23andMe, and uploaded it to Gedmatch. Now at least I have an inkling of what I have to do to get meaningful results back. Thanks!

    ReplyDelete
    Replies
    1. Thank you for reading, KofArizona! If you've not already, there is a link in the article to my FB group for Gedmatch and genetic genealogy. I would invite you to join as we discuss many topics that could help you understand your results.

      Delete
  2. Anonymous8/9/13 17:51

    This comment has been removed by a blog administrator.

    ReplyDelete
  3. Anonymous11/9/13 10:45

    It's interesting.Everyone inherits anywhere from 0% to 50% of each one of their grandparents DNA,and what part that is inherited is unpredictable.I inherited a lot of Eastern European DNA with 54.36%,and according to DNAMatch,and just 7.88% West Asian which I have no clue as to who I inherited the West Asian from lol.People that are 100% full blooded Native Americans on average have 75% East Asian DNA ,and 25% European DNA.So I'm clueless,and I already knew I that 1/8th of my ancestry is Slavic.I inherited most of my DNA from the Slavic people according to this,and part West Asian.

    ReplyDelete
  4. I'm curious, do you have any idea what might cause the following: I have a match on GedMatch (34cM, 4.9 generations), but we aren't a match in Ancestry. Both of us uploaded our data from Ancestry to GedMatch. I just don't get it.

    ReplyDelete
    Replies
    1. What location on the chromosome is it? Some facilities don't recognise segments that cross the centromere. It's also possible that Ancestry places the match farther back than Gedmatch and that you are both there, but in the distant match category and buried there.

      Delete
  5. Anonymous2/1/14 17:14

    There is no Ashkenazi category in the EU test so of course it came up zero.

    ReplyDelete
    Replies
    1. Yes, thank you. The point is to look at where the % from the J Test goes when there is no category for it in the EU test. If it was true Ashkenazi, then it wouldn't disperse itself amongst my other categories. It would either stand alone or inflate the Mediterranean.

      Delete
  6. Anonymous15/1/14 15:41

    Is there a possibility that I could lose my raw data? I saw where Ancestry said that once you download, you are responsible for keeping your own data. Will I be able to keep a copy as a backup?

    ReplyDelete
    Replies
    1. You can lose your data if you delete it completely from Ancestry.com. If you delete it, they cannot reupload it or replace it without buying a new kit. Once you download the data, you are responsible *for that copy*. That download is yours and you can do with it as you like. Make copies for backups, upload to other sites, etc. But Ancestry will take no responsibility for the results or issues that arise from uploading that raw data to other sites.

      Delete
  7. Three months later and my DNA kit is STILL being processed through GEDmatch; the site has had multiple server issues and seems to be a shell of a website. I was initially very hopeful to see my results; now it appears that the site is permanently disabled for many.

    ReplyDelete
    Replies
    1. Remember that it is a volunteer run site and that they did get a new server just before Christmas, but there were also DNA sales all through November to January. If you don't get batched soon, I'd recommend checking the tool that checks for errors in your upload and then the FAQ of what's new. If there's no announced delay or timeline (which they've been updating in the FAQ), I would email them to have them investigate.

      Delete
  8. Anonymous7/4/14 09:54

    I think I will love this site but can t get pass the gedmatch registration. The link seems to be dead. Any advice?

    ReplyDelete
  9. Appears GedMatch is down. Maybe they couldn't get enough donations to keep it going :(

    ReplyDelete
  10. Anonymous5/5/14 14:48

    Gedmatch.com has been down for days for "maintenance." No word as to if and when it'll be back up.

    ReplyDelete
  11. Anonymous28/5/14 15:24

    GEDMATCH IS UP! I went on a couple of days ago and uploaded and played with my raw data using their tools. Hopefully I will get results soon!

    ReplyDelete
    Replies
    1. Anonymous2/6/14 16:13

      And it's down again. You will learn, as we all do, that Gedmatch is a precious rainbow that we appreciate while it's there, and spend the rest of our time awaiting its return. :-)

      Delete
  12. Has anyone seen a concise write up on the significance of the data found under Eurogenes Oracle links after some of the admixture runs complete - in particular "Single Population Sharing" and what the "Distance" values are actually telling you? I need background and definitions before I can make heads or tails out of this data! Anyone?

    ReplyDelete
  13. Thanks for shearing about this lots of tests, Thanks for your great and helpful presentation I like your good service.I always appreciate your post. Excellent information on your blog, thank you for taking the time to share with us.
    Gene testing kit

    ReplyDelete