Prepare to run BiSSE (Binary State Speciation and Extinction) on a phylogenetic tree and character distribution. This function creates a likelihood function that can be used in maximum likelihood or Bayesian inference.

make.bisse(tree, states, unresolved=NULL, sampling.f=NULL, nt.extra=10,
           strict=TRUE, control=list())
starting.point.bisse(tree, q.div=5, yule=FALSE)

Arguments

tree

An ultrametric bifurcating phylogenetic tree, in ape “phylo” format.

states

A vector of character states, each of which must be 0 or 1, or NA if the state is unknown. This vector must have names that correspond to the tip labels in the phylogenetic tree (tree$tip.label). For tips corresponding to unresolved clades, the state should be NA.

unresolved

Unresolved clade information: see section below for structure.

sampling.f

Vector of length 2 with the estimated proportion of extant species in state 0 and 1 that are included in the phylogeny. A value of c(0.5, 0.75) means that half of species in state 0 and three quarters of species in state 1 are included in the phylogeny. By default all species are assumed to be known.

nt.extra

The number of species modelled in unresolved clades (this is in addition to the largest observed clade).

control

List of control parameters for the ODE solver. See details below.

strict

The states vector is always checked to make sure that the values are 0 and 1 only. If strict is TRUE (the default), then the additional check is made that every state is present. The likelihood models tend to be poorly behaved where states are missing.

q.div

Ratio of diversification rate to character change rate. Eventually this will be changed to allow for Mk2 to be used for estimating q parameters.

yule

Logical: should starting parameters be Yule estimates rather than birth-death estimates?

Details

make.bisse returns a function of class bisse. This function has argument list (and default values)


    f(pars, condition.surv=TRUE, root=ROOT.OBS, root.p=NULL,
      intermediates=FALSE)
  

The arguments are interpreted as

  • pars A vector of six parameters, in the order lambda0, lambda1, mu0, mu1, q01, q10.

  • condition.surv (logical): should the likelihood calculation condition on survival of two lineages and the speciation event subtending them? This is done by default, following Nee et al. 1994.

  • root: Behaviour at the root (see Maddison et al. 2007, FitzJohn et al. 2009). The possible options are

    • ROOT.FLAT: A flat prior, weighting \(D_0\) and \(D_1\) equally.

    • ROOT.EQUI: Use the equilibrium distribution of the model, as described in Maddison et al. (2007).

    • ROOT.OBS: Weight \(D_0\) and \(D_1\) by their relative probability of observing the data, following FitzJohn et al. 2009: $$D = D_0\frac{D_0}{D_0 + D_1} + D_1\frac{D_1}{D_0 + D_1}$$

    • ROOT.GIVEN: Root will be in state 0 with probability root.p[1], and in state 1 with probability root.p[2].

    • ROOT.BOTH: Don't do anything at the root, and return both values. (Note that this will not give you a likelihood!).

  • root.p: Root weightings for use when root=ROOT.GIVEN. sum(root.p) should equal 1.

  • intermediates: Add intermediates to the returned value as attributes:

    • cache: Cached tree traversal information.

    • intermediates: Mostly branch end information.

    • vals: Root \(D\) values.

    At this point, you will have to poke about in the source for more information on these.

starting.point.bisse produces a heuristic starting point to start from, based on the character-independent birth-death model. You can probably do better than this; see the vignette, for example. bisse.starting.point is the same code, but deprecated in favour of starting.point.bisse - it will be removed in a future version.

Unresolved clade information

Since 0.10.10 this is no longer supported. See the package README for more information.

This must be a data.frame with at least the four columns

  • tip.label, giving the name of the tip to which the data applies

  • Nc, giving the number of species in the clade

  • n0, n1, giving the number of species known to be in state 0 and 1, respectively.

These columns may be in any order, and additional columns will be ignored. (Note that column names are case sensitive).

An alternative way of specifying unresolved clade information is to use the function make.clade.tree to construct a tree where tips that represent clades contain information about which species are contained within the clades. With a clade.tree, the unresolved object will be automatically constructed from the state information in states. (In this case, states must contain state information for the species contained within the unresolved clades.)

ODE solver control

The differential equations that define the BiSSE model are solved numerically using ODE solvers from the GSL library or deSolve's LSODA. The control argument to make.bisse controls the behaviour of the integrator. This is a list that may contain elements:

  • tol: Numerical tolerance used for the calculations. The default value of 1e-8 should be a reasonable trade-off between speed and accuracy. Do not expect too much more than this from the abilities of most machines!

  • eps: A value that when the sum of the D values drops below, the integration results will be discarded and the integration will be attempted again (the second-chance integration will divide a branch in two and try again, recursively until the desired accuracy is reached). The default value of 0 will only discard integration results when the parameters go negative. However, for some problems more restrictive values (on the order of control$tol) will give better stability.

  • backend: Select the solver. The three options here are

    • gslode: (the default). Use the GSL solvers, by default a Runge Kutta Kash Carp stepper.

    • deSolve: Use the LSODA solver from the deSolve package. This is quite a bit slower at the moment.

deSolve is the only supported backend on Windows.

See also

constrain for making submodels, find.mle for ML parameter estimation, mcmc for MCMC integration, and make.bd for state-independent birth-death models.

The help pages for find.mle has further examples of ML searches on full and constrained BiSSE models.

References

FitzJohn R.G., Maddison W.P., and Otto S.P. 2009. Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Syst. Biol. 58:595-611.

Maddison W.P., Midford P.E., and Otto S.P. 2007. Estimating a binary character's effect on speciation and extinction. Syst. Biol. 56:701-710.

Nee S., May R.M., and Harvey P.H. 1994. The reconstructed evolutionary process. Philos. Trans. R. Soc. Lond. B Biol. Sci. 344:305-311.

Author

Richard G. FitzJohn

Examples

## Due to a change in sample() behaviour in newer R it is necessary to
## use an older algorithm to replicate the previous examples
if (getRversion() >= "3.6.0") {
  RNGkind(sample.kind = "Rounding")
}
#> Warning: non-uniform 'Rounding' sampler used
pars <- c(0.1, 0.2, 0.03, 0.03, 0.01, 0.01)
set.seed(4)
phy <- tree.bisse(pars, max.t=30, x0=0)

## Here is the 52 species tree with the true character history coded.
## Red is state '1', which has twice the speciation rate of black (state
## '0').
h <- history.from.sim.discrete(phy, 0:1)
plot(h, phy)


lik <- make.bisse(phy, phy$tip.state)
lik(pars) # -159.71
#> [1] -159.71