Learning Objectives

Understand how the behavior of a population with finite size differs from the behavior of an idealized Hardy-Weinberg population with infinite size.
Practice Hardy Weinberg calculations.
Develop an intuitive understanding of the influences of population size and initial allele frequencies on genetic drift.
Build skills in data management and making figures in Excel.

Introduction

For this workshop, you will be using a simulator of a finite Hardy Weinberg population.

This is a simulation of diploid organisms that can be either red, purple, or blue. They have a single locus for color with two alleles R and B. Allele R codes for red pigment and B codes for blue. Individuals who are homozygous for R are red, individuals who are homozygous for B are blue, and heterozygotes are purple.

The simulation interface shows the current counts of red, purple, and blue individuals as well as the population allele frequencies (p is the frequency of the R allele, q is the frequency of the B allele). You can set the population size (N) and initial frequency of the red allele (p).

The simulation works by first creating a population of red, purple, and blue individuals according to the population size (N) When you hit the ‘Run simulation’ button, a gamete pool is created in which every red individual contributes 2 R alleles, each purple individual contributes 1 R allele and 1 B allele, and blue individuals contribute 2 B alleles. To simulate random mating, new offspring are created by randomly selecting, with replacement, two gametes from the gamete pool. This process is repeated N times such that there are N new individuals for the next round. Note that since the simulation is random, you will probably get different numbers of offspring of each color every time you run the simulation. Try running several simulations for 1 time step each, hitting the ‘Reset simulation’ button each time. You should see that each time you might get slightly different counts of the different colored offspring.

For this exercise, you will want to review the Hardy-Weinberg model and the formulas for observed and expected allele and genotype frequencies. Recall that at HW equilibrium, the expected genotype frequencies can be calculated from the population allele frequencies p and q.

Using the symbols for population allele frequences (p and q), write the expressions for:

\(F_{exp}[RR] = ?\)
\(F_{exp}[RB] = ?\)
\(F_{exp}[BB] = ?\)

If N is the population size, write expressions for:

\(N_{exp}[RR] = ?\)
\(N_{exp}[RB] = ?\)
\(N_{exp}[BB] = ?\)

Getting to Know the Simulator

Click here to open the simulator in a new tab.

Notice that the simulation initializes with a population of size 20 and p = q = 0.5.

Move the slider and set the ‘Initial % red’ parameter to 75%. Now calculate the expected counts of individuals of each genotype for a population of 20 with p = 0.75.

\(N_{exp}[RR] = ?\)
\(N_{exp}[RB] = ?\)
\(N_{exp}[BB] = ?\)

When you calculate expected genotype counts, they don’t have to be whole numbers, and they often won’t be!

Press ‘Reset simulation’ to initialize the simulator with the updated p. Is the initial population in HW equilibrium?

Now hit the ‘Run simulation’ button to simulate one generation of random mating.

Note the new genotype counts and the updated values of p and q.

Recalculate the expected genotype frequencies and counts. Is the population still in HW equilibrium? If not, which genotypes are over- or under-represented?

\(N_{exp}[RR] = ?\)
\(N_{exp}[RB] = ?\)
\(N_{exp}[BB] = ?\)

The Hardy-Weinberg Calculator

Doing these calculations by hand is good practice, but it quickly gets very tedious. You’ll be doing quite a few such calculations for this workshop, so now that you’ve had some practice doing them on your own, you should check out the HW calculator Excel Spreadsheet. All of the formulas are already set up for you, you simply need to input a couple of data points:

N – the population size. Enter this value in cell B2.
P – the frequency of the R allele. Enter this value in cell B5.
Observed genotype counts. Enter these in the cells D16 – D18. This will allow you to compare the expected counts to the observed counts.

The expected genotype frequencies and counts will be calculated for you. When you enter the observed counts, the calculator will calculate ‘delta’, which is the sum of the differences between the observed and expected counts. It will also calculate the ‘strain’ which is a measure of how far away the observed counts are from HW equilibrium.

NOTE: be careful to only enter data into the cells specified above. The other cells contain formulas for the calculations. The calculator will not work if you overwrite any of these formulas.

Use the calculator to check your calculations from the previous section.

Using the Simulator

Now take a few minutes to play around with the simulator. Try different settings for N and the initial frequency of p.

Using the default settings (N = 20, p = 50%), run the simulation one step at a time until an allele reaches fixation. Do this several times.

Did it take the same number of steps to reach fixation each time? Did the same allele reach fixation each time?

Now set N to be a larger number and run several simulations.

Does it take about the same time for one allele to reach fixation with the larger population size?

Why do you think it takes longer for larger populations to reach fixation?

NOTE: if it is taking many time steps to reach fixation, you can adjust the number of time steps taken per turn using the ‘Generations per step’ slider. You can set it to run up to 100 generations at a time.

Let’s use the calculator to examine some of the simulation results.

Using a population size of 20 and an initial red allele frequency of your choosing, calculate and record the values of delta and strain at the beginning of the simulation. Simulate for one time step and recalculate the values.

Did they change very much?

Simulate a few more time steps, recording the values of delta and strain each time.

What do these values tell you about how well HW equilibrium is maintained in a small population?

Now repeat the procedure with a population size of 20,000. Compare your strain values to the simulation with N = 20.

Can you explain the differences?

Graphing the Results

Now we want to explore the model’s behavior a little more systematically by taking a graphical approach to build intuition.

Notice that the simulator has a ‘Save results to csv file’ button. This allows you to save the results for a simulation to a csv file that you can open in Excel. For each time point it records the observed and expected counts of red, purple, and blue individuals, the R and B allele frequencies, delta, and strain.

Run a simulation with your choice of N until one allele reaches fixation then export the results file. Give the file a descriptive filename and save it somewhere you will remember! Open the file in Excel and take a moment to look at how the file is organized.

Now make a graph of the populations of the genotypes over time. To do this, select the three columns for the observed counts of red, purple, and blue individuals (columns B, C, and D). Next click on the ‘Insert’ tab and select line chart. Do not select a stacked line chart. Note this procedure may be slightly different depending on which version of Excel you are using. You may also complete this exercise using Google Sheets or another spreadsheet program, but the exact procedure for producing the graphs will vary.

What does this graph show?

What happens to the proportions of the three genotypes over time?

We can also plot the allele frequencies over time. Select the columns for the R and B alleles and make another line chart.

How is this graph different than the previous one?

What is similar between the two graphs?

Note that the data file is in csv format, which is not a fully functional Excel format. It does not allow you to write formulas and save charts. To continue, you will need to create a blank spreadsheet in excel in which you will recreate your charts so that you can save them. Name your spreadsheet something descriptive like ‘Finite population HW simulation.xlsx’.

Run a new simulation with a population size in the range of 20 – 50. Run your simulation for enough steps for one allele to go to fixation and export your results to a csv file. Open the csv file and select the three columns for the genotype frequencies. Copy these columns into the workshop Excel file that you just created. Next copy the p and q columns and paste them next to the genotype frequency columns. Create line plots of the genotype and allele frequencies.

Do the same for a second simulation with a population in the range of 200 – 500 and export the results. Create a new tab in your spreadsheet and paste the genotype and allele frequencies into it. Make line plots for the genotype and allele frequencies.

What is different in the plots for the small and large population size simulations?

Things to Try

You can use the simulator to help you develop an intuitive feel for what happens in finite populations undergoing genetic drift. Play with the simulator and try graphing your results from different simulations. Some things to investigate:

Run 10 simulations with a population size of 100 and initial red frequence at 50%. How long does it take, on average, for one of the alleles to reach fixation. What do you think you will observe if you repeat that process with an initial red frequency of 25%. run the simulations and compare the average time to fixation for the two different initial frequencies. We now that larger populations take longer to reach fixation, but if we were to plot the curve of population size vs time to fixation what would it look like? Would it be linear, logarithmic, exponential, or something else? Make a spreadsheet with two columns: 1 for population size (N) and one for time to fixation. For different values of N between 20 and 500, run replicate simulations and record the number of generations it takes to reach fixation. Graph your results in Excel and describe the curve.

Molecular Basis of Evolution 1 - Genetic Drift