cricketsim - Methodology

Overview

The cricketsim program is a computer simulation that can be used to explore various evolutionary and genetic questions relating to speciation and species maintenance. It simulates a two-dimensional grid world (a "lattice" world) where each cell in the grid may have a terrain type and contain one or more crickets. Usually the world is filled with two separate populations of crickets, each from a different "species". There are potentially several endogenous and exogenous mating barriers between the two species, although simulations can be run with no mating barriers whatsoever. Each species is differentiated from others by internal genetic markers as well as outward traits (pulse rate, pulse rate preference, a "preferred" ideal terrain).

The simulated crickets have limited perception of the world about them and of nearby crickets. The simulation proceeds by discrete timesteps, where each cricket is allowed to perform one or several kinds of actions per timestep, all outcomes of their actions are computed, and then the next timestep begins. Each cricket receives input about its current and nearby terrain. They also receive input about other crickets in their current cell and adjacent cells. Male crickets can send mating calls, females can hear these calls, and all crickets can choose to move to an adjacent cell, mate with another cricket in their cell, or remain where they are.

Each cricket in the world has its own genome, a double-stranded list of values which can be configured to code for several traits. Each cricket's genome codes for its sex, a male pulse rate (only expressed by males, dormant in females), a female pulse-rate preference (only expressed by females, dormant in males), an "endogenous fitness" rating, and an ideal terrain type. Since male crickets "sing" to females, and rfemales decide which males to move closer to and eventually mate with, pulse rate and pulse-rate preference form the backbone for interactions in cricketsim.

Each run of the cricketsim simulation consists of creating a lattice world of the given size and terrain layout, creating an initial population of crickets according to the simulator parameters and genotype parameters, and distributing these crickets among the various cells in the world. The simulation proceeds by presenting each cricket with the input it should receive, given its current location and the state of nearby crickets (e.g., females can hear nearby males who signaled in the previous time step). Each cricket either moves or stays in their current cell, males send signals that other females will hear next timestep, females respond to signals they heard last timestep, some females will mate with males (instantly producing offspring), any new offspring will be placed in the world, and the timestep ends. Each new timestep proceeds from the state of the world at the end of the last timestep. The simulation ends when the maximum number of timesteps has been reached (user-supplied, usually 5000 timesteps is enough to study the dynamics of the cricket populations).

Males decide to move or remain where they are based on terrain and the density of males in the current cell. They always emit a mating signal. Females decide to move or remain where they are based on terrain and whether or not they have heard a male's song they prefer more than those males in their current cell. A female may also choose to mate with a male in her current cell, again, based on how much she likes his song. Male pulse rate and female pulse-rate preference play an important role in these decisions.

The following sections describe each aspect of the cricketsim simulator in more detail.

World

The world that the crickets inhabit is a rectangular grid of cells. Each cell can contain 0 or more crickets. Each cell also has a terrain type. Types are simply numbers, but can be thought of as different kinds of ground cover, elevation, temperature ranges, etc. So type 1 might be leaf litter in a forest, type 2 might be a grassy plain, and type 3 might be sand dunes. Crickets attempt to move to terrain types that match their own "ideal" terrain types. So those crickets which prefer grassy plains will move to type 2 cells. If they are already in a type 2 cell, and this is their ideal terrain, then they would normally stay in that cell. Other circumstances besides terrain type can cause crickets to move.

The number of rows (height) and columns (width) of the world can be specified. In addition, the number of terrain types and the distribution of the terrain types across the cells can also be specified. Usually, only 2 terrain types are used. A random layout means that each cell is given a random type. A "tile size" can be specified so that the random layouts occur in clusters of 2x2, 3x3 or other square cell clusters. A striped layout assigns adjacent columns of cells to the same terrain type. If there were 10 columns in the world and 5 terrain types, a striped layout makes 5 double-column stripes of cells, each stripe containing cells all of the same terrain type.

Crickets

Each cricket has its own genome, which codes for its traits: sex, pulse rate, pulse-rate preference, endogenous fitness, and ideal terrain type. The values for each cricket's genome come from its parents through recombination and through occasional mutations (rare; a user-specified probability). The initial population's genomes (i.e., the alleles) are randomly generated. How the genome determines each trait is explained in the section on Genomes. Each cricket in the world is either male or female. Crickets have a random lifespan, given at birth, chosen from a normal distribution (30 +/-5 in our sims?). Each cricket's age begins at 1 (at birth), and they age 1 timestep for every timestep they are in the simulation. When their age reaches their lifespan, they die and are removed from the world. Other endogenous and exogenous factors can influence how quickly they age. This is how the simulation implements the (potentially) deleterious effects of hybridity. In this sense, a cricket's age is really more like it's total "life energy": when this total is used up, the cricket dies. The fittest crickets use their life energy at a rate of 1 unit per timestep in the simulation, but unfit crickets may use more of this life energy per timestep.

Each cricket has an "endogenous fitness" value given by its genome. This value, along with a weighting parameter provided to the simulator, determines how much more the cricket ages during each timestep it exists in the world. For example, if a cricket's endogenous fitness value is 0, it will only age as normal, 1 timestep per timestep spent in the world. If its endogenous fitness is 4, on the other hand, the cricket will age 1+4=5 timesteps per timestep spent in the world. A user-provided constant can be set to any value to magnify or minimize this effect. If that constant is set to 0, crickets will suffer no aging penalty from their endogenous fitness values.

In addition to "endogenous fitness", crickets also have "exogenous fitness", most directly affected by their ideal terrain type. For simple worlds with 2 terrain types (TT=1 and TT=2), all crickets will have an ideal terrain type that is within this range. A cricket's ideal terrain type (cTT) is actually a real value, in this case 1 <= cTT <= 2. Pure species crickets would have cTT=1 or cTT=2, while hybrids would have some value in between (a hybrid of two pure crickets would have a cTT=1.5 in this example). This terrain factor also affects a cricket's age. If the cricket is in a cell such that TT=cTT, then it has no extra aging penalty. However, a cricket in a cell with non-ideal terrain would age more than the normal 1 timestep per timestep spent in the simulation. The actual value is K|cTT-TT|, where K is a constant provided by the user. If K=0, endogenous fitness would never be a factor in that particular run.

Every timestep, all of the crickets are allowed to move. Males move based on the density of males in their cell and on the nearby terrain types. Females move based on terrain types and on which male signal they prefer (if any). Terrain-based movement is simple: if a cricket's ideal terrain type doesn't match its current terrain type, and it can find a better terrain type in and adjacent cell, it will move to that type. Otherwise, it will stay. Males may also move based on how dense the male population is in their current cell. A user-defined value specifies the ideal density (say, 10 males), and a male in a cell with fewer or more males will move from that cell to an adjacent cell with a more ideal density. Females may also move based on hearing a male in a nearby cell and preferring his song over others in her own cell. If she hears a male in an adjacent cell whose song she prefers more, she will move to that cell. If she is in the same cell as the male whose song she prefers most, she will mate with that male. The user specifies which of these movement constraints wins in a tie for both males and females.

Females hear all males who sing near them (the male's signal range is user-defined). Each female must choose which male she will listen to and ignore all others. She does this with her preference function, which is based on several user-defined parameters as well as the males' pulse rates ("PR", given by each male's genome) and her own pulse-rate preference ("PR preference", given by her genome). The female preference functions are best-proportional (choose male whose PR is closest to her PR preference, with higher PR preference values implying a larger range of tolerance), best-inverse-proportional (same as best-proportional, but her tolerance for PR deviations is much smaller as her PR preference value gets higher), best-fixed (as with best-proportional, but the female's tolerance for PR deviations is the same regardless of her PR preference value), open-ended increasing (female prefers male with highest PR), and open-ended decreasing (female prefers male with lowest PR). A difference between the male's PR and the female's PR preference is computed, and if the difference is too great (based on her preference function), she will reject the male entirely, even if he has the best PR nearby. Otherwise, the preference function gives a "score" to each male's PR, and the female chooses to follow/mate with the male with the best score according to her preference function.

For example, if female1 has a PR preference value of 1.0, female2 has a value of 3.0, and a male sings at a PR of 2.0, the difference between the female preferences and the male's PR are computed; both differences are 1.0. For a best-fixed preference function, both females would have the same response to the male (either both would reject him or both would accept him, depending on the user-defined "preference func tolerance"). For best-proportional, female1 would be less likely to accept the male than female2 since her tolerance range is smaller - her preference value is less than female2's. For best-inverse-proportional, the opposite is true. The actual result (accept/ignore) would depend on the "preference func tolerance" parameter. For open-ended-increasing, female1 would accept the male while female2 would reject him (his PR is below her preference value). The opposite result would be the case for open-ended-decreasing.

When a female has decided to mate with a male (they must both be in the same cell for this to happen), they produce a random number of offspring, chosen from a normal distribution whose parameters are user-defined. Offspring are created by sexually combining the male and female's genomes to create new, diploid crickets. These new crickets are placed randomly in the world, near their parents. The exact placement algorithm is determined by user parameters. After mating, neither parent can mate until the "refractory period" (e.g., 10 timesteps) has elapsed for them - another user-defined value.

Genomes

Each cricket's genome is diploid. The number of loci in a genome, as well as its entire structure, is determined by user-defined parameters before the run begins. All crickets will have the same genetic structure, even though the alleles may vary wildly.

Each gene has a locus, and the only purpose for this locus is to make recombination more spatially realistic. The alleles are stored in a list, but each allele is associated with a locus so that even if it is adjacent to another allele, its "virtual" location may be far away. This would imply, in such a case, that a crossover might occur between two far-away alleles, even if they are adjacent in the list of alleles coding for this cricket. The justification for using virtual loci is that there are many genes in the genome which would be used by a real cricket, but they do not play a functional role in this simulation. Thus, their absence is determined by placing the genes that do play a functional role at various virtual locations along a genome which should be considered 100.0 units long.

An example of a short, simple genome, with loci and alleles (allele1/allele2 are a pair since the genome is diploid).

Locus:   0.5 | 57.2 | 65.9 | 87.0
Allele1: .3  | .1   | .3   | .2
Allele2: .2  | .2   | .3   | .1
In this example, there are four loci in the genome, the first locus contains .3 and .2 as values. The next locus is adjacent to the first, but is located almost half the total virual length of the genome away from that first locus. Thus, recombination is more likely to happen between these two loci than any other loci in this genome.

Each gene also comes with a marker. If there are 2 species in the simulation, then the first species will be marked with "A" (this marker is actually user-defined), and the second will be marked with "B". When each cricket is created, its overall "hybridity" is determined by simply calculating how many loci have both alleles as "A"s or "B"s, and then determining which of "A" or "B" is in the majority.

Genome from above, but with markers:

Locus:   0.5 | 57.2 | 65.9 | 87.0
Allele1: .3  | .1   | .3   | .2
Marker1:  A  |  B   |  B   |  A
Allele2: .2  | .2   | .3   | .1
Marker2:  A  |  A   |  B   |  A
In this case, the genome is mostly "A"-type, but there is hybridity at two loci. So the genome might be considered 62.5%(5/8) "A".

Encoding of traits

When setting up a run, the user specifies the structure of the genome. Each genome has one sex locus. The pulse rate (PR), PR preference and "fitness" traits all have one or more loci. Both the number of loci and their virtual locations are user-specified. Initial values for the alleles at these loci (for the initial population of crickets added to the world) are randomly generated from normal distributions whose means and standard deviations are also user-defined. All crickets in a run, regardless of species, have the same genetic structure. This enables any cricket from any species to breed with another during the simulation run, even if the offspring themselves have poorly-fit traits for their environment.

The genome has one locus to determine sex. The value for an allele at the sex locus is either "X" or "O" (absence of the "X" allele). Females are "XX", while males are "XO" or "OX". "OO" is prevented by the simulator and another value will be chosen if the sex locus is set to "OO".

Crickets have Npr (usually 10) loci which encode for pulse rate (PR). The PR trait for a cricket is calculated simply by adding up all of its PR alleles. The same is true for PR preference.

A cricket's endogenous fitness is calculated by sampling a number of pairs of loci from the "fitness loci". If there were 5 fitness loci, 2 random loci would be chosen (with replacement) from this set. They would be compared for mismatches. Then another 2 loci would be chosen and checked for mismatches. A mismatch is looked for within each pair (are the alleles the same?) and between each pair (same-strand checks only), yielding four possible mismatches. The parameter "#pairs of age loci to sample for heterozygosity" specifies how many pairs will be chosen in this manner. The total number of mismatches is the cricket's "fitness mistmatch score", or Sf. This value is multiplied by the "Genetic age factor" parameter to determine how many timesteps extra the cricket will age for every timestep it is in the world.

A cricket's ideal terrain type (serving as a type of exogenous mating barrier) is determined using the same "fitness loci" as for the endogenous fitness trait. This trait's value is simply the average of all of the alleles found at these loci.

The example below shows a genome with several alleles coding for each trait. This cricket is male, with an "XO" at its sex locus. If this cricket's PR trait were specified by 2 loci, one at 0.5 and the other at 57.2, then this cricket's PR (pulse rate) would be .3+.2+.1+.2 = 0.8 (adding up values at both alleles in this diploid genome). Supposing that the next two loci code for PR preference, this cricket's PR preference would be .4+.28+.2+.19 = 1.17. Since the cricket is male, it will not express the PR preference trait, but it will still pass these genes on to its offspring. If the next 3 loci were to code for the "fitness" trait, then suppose that 2 pairs are to be chosen ("#pairs of age loci to sample for heterozygosity" = 2). If the first pair is <87.0,87.4>, the first locus has no internal mismatch, the second does (1.0 is not the same as 2.0), and there is a mismatch in "Allele1" between 87.0 and 87.4. So there are 2 mismatches here. A second pair of loci might be 87.2 and 87.4. There are mismatches within each locus, and mismatches across each locus, making 4 mismatches. The total is 6 mismatches, and if the "Genetic age factor" were .3, this would mean the cricket ages 6x0.3=1.8 (rounded down to 1) timesteps more per timestep it is in the world. This cricket's ideal terrain type is (1.0+1.0+1.0+2.0+2.0+1.0)/6=(8.0/6)=1.25. Thus, the cricket is more at home in TT=1 than TT=2, but it might still incur a small age penalty even in TT=1 (depending on the size of the "Terrain sensitivity" parameter).

Locus:   0.5 | 57.2 | 65.9 | 74.1 | 87.0 | 87.2 |87.4 | 99.0
Allele1: .3  | .1   | .4   |  .2  |  1.0 |  1.0 | 2.0 | "X"
Marker1:  A  |  B   |  A   |  B   |  A   |  A   |  B  |  A
Allele2: .2  | .2   | .28  |  .19 |  1.0 |  2.0 | 1.0 | "O"   
Marker2:  A  |  A   |  A   |  B   |  A   |  B   |  A  |  A

Primary Parameters for Experimentation

TBD

Data Collected

Data are collected at every timestep of the simulation. These data are comprised of tallies, averages, computations and snapshots of various states of the individual crickets, cricket subpopulations, the whole population and the world.

The primary forms of data collected are data per cricket, data for each cricket "species" (based on crude genetic categorizations), data for males, data for females, and data on the spatial layout of the crickets.

Individual-based data consist of averages of how many offspring were produced by each cricket in its lifetime, the pulse rates and pulse-rate preferences of the crickets.

Population-based data include classifying each cricket (over time) into a histogram bin based on its "relative hybridity" (how "pure" of an "A" or "B" genome it has), and then computing mating success or number of individuals for each bin.

Spatial data attempt to show the spatial distribution of crickets based on their predominant genotype ("A" or "B"). Since most crickets will be at least partially hybrid after the simulation has run for a while, these classifications are only approximate.

Details on each kind of data collected are given in the user's manual.

Experiments and Parameters

The following experiments are patterned after predictions made by Jiggins and Mallet in their paper, "Bimodal hybrid zones in speciation", TREE 25(6) June 2000.

TBD