Department of Genome Sciences and Department of Bioloogy University of Washington |
© Copyright 2000, 2009 by the University of Washington. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed.
Contevol is a simulation program designed for my class in Evolutionary Genetics. It allows the user to simulate the evolution of a quantitative character that is controlled by 5 loci, where there is a fitness function designed by the user. The user gets to see the distribution of the values of the quantitative character in a population, generation after generation. The genotypes that make up this distribution (or at any rate a representative sample of them) are displayed in the same histogram.
Contevol is copyright to the University of Washington, 2000. Permission is granted to copy and use the program as long as its copyright notices are not removed. It can be used for free as long as it is not sold, or offered for sale as part of any product or service.
The population consists of N diploid individuals (where the default value of N is 100, but you can change this at run time). Each individual has 5 loci. Each locus has two alleles, one the capital letter (A, B, C, D, or E) and one the lower-case letter (a, b, c, d, or e). The phenotype is additively determined by these loci, and it is equal to the number of capital letters. There is no environmental effect on the phenotype (unlike real quantitative characters). Thus, for example, the diploid genotype AABBccDdee has phenotype 5 (the count of the number of capital letters in this genotype.
The loci are assumed to be unlinked (as if they were on different chromosomes). Thus each locus segregates independently into a gamete. Mating is at random, with only one sex and selfing allowed. This means that a mating is performed by choosing one parent from the population at random, then choosing the other parent at random with replacement. Thus it is possible (a small fraction of the time) to choose the same individual twice and have self-fertilization. (We could imagine ruling this out by requiring that the two parents be different individuals -- this would make only a small difference in the outcome of the simulation).
Each offspring is produced by choosing another pair of parents and mating them. The choices for different offspring are independent and with replacement. Thus it is possible for some individuals to have more than one offspring, and others to have none. This is a standard evolutionary genetics model called the Wright-Fisher model (as it was invented in 1930 and 1932 by the two great founders of that field, R. A. Fisher and Sewall Wright). In effect each offspring gets to choose their parents, independently.
To start the program, simply double-click on its icon (if you are on a Macintosh or Windows system), or type the program name contevol if you are on a Unix (or Linux) system. You will be presented with a small menu, which looks like this:
Contevol (c) Copyright University of Washington 2009 |
The menu shows you the current values (in this case the default values) of six of the parameters of the program. The column labelled "Option character" shows which character you should type to change the value of each parameter. For example, to change the population size, type "N". When you choose a parameter to change the program will prompt you for the new value. After that it will return you to the menu. When all the parameters have their desired values, you can accept them by typing the character Y. If you want to stop the run at this point, you can type Q.
When the parameters are accepted, the first thing the program does is to show you the fitness curve (the adaptive surface or fitness function). This shows the fitnesses of the different possible phenotypes, plotted against the phenotype. The plot is very crude because it is made using characters on a screen rather than by drawing curves in a graphics window. For example, if the fitness function had two optima, with optimum phenotypes 3 and 6, and strength of selection 1, the plot looks like this:
1.00 | O O | . . . . 0.91 | . . | . . . . 0.82 | . . . . | . . 0.73 | . O.....O . | . . 0.64 | | O O 0.55 | | . . 0.45 | . . | 0.36 | . . | 0.27 | . . | . . 0.18 | | O O 0.09 | ... ... | .. .. L---0-----1-----2-----3-----4-----5-----6-----7-----8-----9----10--- to continue, press Enter key, to stop press Q, to do menu again press M |
The heights of the letters "O" show (roughly) the fitnesses for each of the possible phenotypes, from 0 to 10. The dots interpolate between these roughly in straight lines (the values at those interpolated points do not matter, as all of the phenotypes always turn out to be whole numbers).
The fitness is determined by a curve which has one or two optimum values of the phenotype, with the number of optima and the optimum values determined by the program menu. Initially the menu shows one optimum with a value of 5.0. The user can either change that value, or can choose to change the number of optima from 1 to 2. In that case the O menu item is replaced by two menu items, 1 and 2. In the above example, two optima have been chosen and given the values 3.0 and 6.0. Although the phenotypes must (in our artifical example) be integers, the optima can be any real numbers. If they are too far outside the range of the phenotypes problems may occur from underflow of the fitnesses.
The fitness curve around an optimum falls away according to a Gaussian (or Normal) curve. This has the shape of the function
e-x2
but it is shifted so that the peak occurs at the optimum, and compressed or spread out so that the curve falls away at the desired rate. The rate at which it declines is controlled by another parameter in the menu, the strength of selection. This value (S) is larger the weaker selection is, and smaller the stronger it is. As the selection curve has the form
e-(x2)/S
fitness function will fall to e-1= 0.367879 of its peak value when x2 = S, which is when x equals the square root of S. Thus a fitness function which has S = 10 falls to 0.367 of its peak height when the phenotype is 3.162 units from the optimum, while a fitness function which has S = 1 is stronger selection, falling to 0.367 of its peak height when the phenotype differs from the optimum by 1.
When there are two optima, the fitness function is the sum of the fitness curves from the two optima. The heights of the curves are added. The fitness function shown in the example above shows this (look at the fitness in the region between the two optima where the curve does not fall away as quickly as it does on the outside of the optima). Note that it is the height (the fitness value) which is added, not the phenotype, which is the horizontal scale.
The fitness function is scaled so that the fitness of the best phenotype is 1. Whether or not this scaling is done actually has no effect on the evolution of the population, but it makes the curve easier to look at when plotted.
As you can see from the bottom line of the above screen image, when you are finished looking at the fitness curve, you will probably want to move on to the simulation itself, by pressing the Enter key. If you want to return to the menu, press M, and if you want to abort the run and stop the program at this point, press Q.
One of the menu items (I) is the initial mean of the character. The initial population is set up as a sample of N individuals drawn from an infinitely large population which has this mean phenotype. Each locus in that infinite population has the same gene frequency, and there is no association ("linkage disequlibrium") between the alleles at different loci. This means that the program determines a gene frequency and draws each gene from a "gene pool" that has that frequency. For example. if the mean phenotype is set to 3, with 5 loci that means that an average of 3 of the 10 gene copies in an individual should be capital letters. Thus the desired gene frequency is 3/10 = 0.3. The program draws genes from a gene pool with gene frequency 0.3 at all five loci. Thus the initial population has, at random, a 0.3 chance for each letter in each individual's genotype that it will be a capital letter.
The program runs a simulation of evolution of the population under random mating, and free recombination among the 5 unlinked loci. Each generation offspring are generated by randomly sampling (with replacement) two individuals from the previous generation to be parents. That offspring is produced by Mendelian reproduction with no linkage among the loci (as if they were on separate chromosomes). The offspring's phenotype is then determined by counting the capital letters in its genotype.
Selection then occurs by calculating the fitness of the offspring from its phenotype value, and using that fitness as the probability of retaining the offspring. Thus if the fitness is (say) 0.72, the program draws a random fraction between 0 and 1 and retains the individual if that random number turns out to be less than 0.72. This gives the offspring the desired 0.72 chance of survival. Offspring are produced, one after another, until N of them have survived. They constitute the next generation.
The menu parameter L allows you to change the number of lines of text that will appear on the screen. If you increase it, you will see a larger histogram, showing more completely what is happening.
The menu parameter G shows how many generations the program will simulate each before showing the user the result. The histogram of the population is displayed every G generations, starting with the G-th generation (thus if G=3 the user will see generations 3, 6, 9, 12, ...). To go forward G more generations you should press the Enter key. If you do not want to continue you can press S to start over with the same parameter values. To return to the menu and change the parameter values, press C. To quit the program, press Q.
The histogram that is shown is a sample of the individuals in the population. Each individual is shown as a box of letters showing its genotype. Thus an individual may be represented as
AbcDE
ABcDe
These individual genotypes are placed in columns according to their phenotypes. Only a sample of the population can be shown, so that the columns are not too high to fit on the screen. At the bottom of the histogram the phenotype values are shown on the dashed axis, and below that are the numbers of individuals in the population that have that phenotype value. Thus you might see (say) a column of 8 individuals at phenotype value 6, but the number beneath the 6 on the axis shows that of the N individuals in the population, 36 actually have that phenotype value. (The individuals that are shown in the histogram are simply the first ones that were produced in that generation.)
Here is a typical histogram, to give you some idea what it looks like:
| ABCDE |_ _ _ _ _ _ _ _AbcdE_ _ _ | ABCDE |_ _ _ _ _ _ _ _AbcdE_ _ _ | ABcDE ABCDE |_ _ _ _ _ _ _abcDE_AbcdE_ _ _ | ABcDE ABCDE ABCDE |_ _ _ _ _ _ _abcDE_AbcdE_ABcDe_ _ | ABcDE ABCDE ABCDE |_ _ _ _ _ _ _abcDE_AbcdE_ABcDe_ _ | ABcDE ABCDE ABCDE |_ _ _ _ _ _ _abcDE_AbcdE_ABcDe_ _ | ABcDE ABCDE ABCDE ABCDE |_ _ _ _ _ _ _abcDE_AbcdE_ABcDe_ABCdE_ | ABcDE ABCDE ABCDE ABCDE |_ _ _ _ _ _ _abcDE_AbcdE_ABcDe_ABCdE_ | aBCDE ABcDE ABCDE ABCDE ABCDE |_ _ _ _ _ _aBcde_abcDE_AbcdE_ABcDe_ABCdE_ | ABCDe aBCDE ABcDE ABCDE ABCDE ABCDE ABCDE |_ _ _ _ _abcde_aBcde_abcDE_AbcdE_ABcDe_ABCdE_ABCDE L---0-----1-----2-----3-----4-----5-----6-----7-----8-----9----10--- 0 0 0 0 3 7 24 30 21 13 2 to continue, press Enter key, to stop Q, to run again S, to change case C |
Note the underscores which are used as "hash marks" to designate the bottom of genotypes. Note also that by running your eye up and down a column of genotyopes you can get some sense of which loci are at high or low frequencies. Thus if we have a column in which all the letters at the B locus are capital letters, that should be immediately apparent. (However, to actually get a sense of the gene frequency at the B locus in the population, you need to look at all the columns). When the population is fixed at all loci, with all gene frequencies 0 or 1, then histogram will show only one column. There is no point in continuing the run further at that point -- you will want to either Quit, or start over.
There are many cases you can explore when running Contevol. Some questions you might ask include:
Contevol can be fetched from this web site:
http://evolution.gs.washington.edu/contevol
Here are some links that will help you get the particular files you need:
If you have a Windows computer you should get:
If you have a Mac OS X system you should get
This puts a file on your system. Clicking on that it will create a "disk image" and mount that on your desktop. Open the disk image. It will show a window that has a folder named contevol-1.01. Copy that folder to some other location (do not try to use the copy that is still in the disk image).The folder will have an executable which has an icon and is called contevol. It can be run by clicking on it. It will open a window which has a light blue background, and show the program menu. You then run the program by typing into that menu.
If you have a Linux system with an Intel-compatible processor such as a Pentium you should get
Executables for less-frequently-used operating systems
If you have a Macintosh Mac OS 8 or 9 (PowerMac) system you should get:
If you have a Compaq/Digital Alpha system running Compaq (Digital) Unix you should get
If you have another Unix or Linux system, you will want to compile the program yourself, which is not hard. You will need:
If you are intending to recompile the program using the CygWin Gnu C++ compiler on a Windows system (which you need do only if you want to modify the program, otherwise you can just use the Windows executables), you will need:
This is mostly straightforward. Here are instructions for running them for Windows, Mac OS X, and Unix (Linux). Note that in the program menu you can change the number of lines of text that appear in the plots.
Windows
Double-click on the executable (make sure that the file cygwin1.dll is in the same folder as the executable). An Command window will open which has the menu in it. It is rather small -- I suggest clicking on the side of the Auto box in the upper-left part of the window, and selecting a bigger font size such as 10x18.
Mac OS X
Double-click on the executable. A window will open in which the text of the menu and the plots will appear. It is not huge, but perhaps can be resized.
Unix (Linux)
Type the name of the executable such as contevol.linux. The menu and plots will appear in the current window.
You shouldn't have to compile it yourself! We provide precompiled executables and those should be fetched and run unless you ave some need to modify the program. For those few people who do, we will describe how to compile the source code on CygWin Gnu C++ for Windows, on Xcode for Mac OS X systems, on Gnu C++ on Unix or Linux systems, or on Metrowerks C++ on Mac OS 8 or Mac OS 9 systems.
Compiling with CygWin Gnu C++
Cygnus Solutions has adapted the Gnu C++ compiler to Windows systems and provided an environment, CygWin, which mimics Unix for compiling. Once you have installed the free CygWin environment and the associated Gnu C compiler on your Windows system, compiling Contevol is essentially identical to what one does for Unix or Linux. We have provided a CygWin Makefile so you can do this (again, for normal use you should not have to recompile at all).
On entering the CygWin environment you will find yourself in one of the
subdirectories of the CygWin folder. Change to the folder where the
Contevol program contevol.c and the file Makefile.cyg
You should then be able to compile contevol.c
by issuing the command
make -f Makefile.cyg
The result should be a compiled
executable called contevol.exe.
Compiling Contevol on Unix (or Linux) systems
If you have one of the kinds of Unix or Linux systems for which we do not
distribute executables, or if you want to make changes in the source code,
you can easily compile the source code. Make sure you are in a folder
which contains our source file contevol.c and the file
Makefile.unix Then simply type the command:
make -f Makefile.unix
The result should be a compiled program called contevol.
Compiling in Mac OS X ...
If you want for some reason to compile an executable, follow these steps:
Compiling with Metrowerks Codewarrior ...
If you have Mac OS 8 or Mac OS 9, you should probably just use the PowerMac executable we supply, unless you have some reason to change the program yourself. These instructions are supplied for that rather rare case. We shall assume that you have a late version of the Metrowerks C++ compiler. This description, and the project files that we provide, assume Metrowerks 5.3. We also assume a reasonable familiarity with the use of the Codewarrior compiler and its Integrated Development Environment (IDE).
Start with a directory (folder) that contains the files contevol.c and Contevol.rsrc, both of which are provided by us.
Creating the project file. We have provided a project file contevol.proj. If you have it then you do not need to do the items on the following list. Skip down to the end of the list.
If you do have the contevol.proj project file, you can just:
Future of the program
The program needs to be caught up to the capabilities of current operating systems. We hope some time in the future to make a version with better graphics, plotting histograms in color instead of relying on the user to see the capital letters and the lower-case letters in the genotypes.
Joe Felsenstein |
Department of Genome Sciences |
University of Washington |
Box 355065 |
Seattle, Washington 98195-5065, U.S.A. |
Electronic mail address: joe (at) removethispart.gs.washington.edu