README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# virtualPollen

[![CRAN_Release_Badge](http://www.r-pkg.org/badges/version-ago/virtualPollen)](https://CRAN.R-project.org/package=virtualPollen)
[![CRAN_Download_Badge](http://cranlogs.r-pkg.org/badges/virtualPollen)](https://CRAN.R-project.org/package=virtualPollen)

The goal of virtualPollen is to provide the tools to simulate pollen curves over millenial time-scales generated by virtual taxa with different life traits (life-span, fecundity, growth-rate) and niche features (niche position and breadth) as a response to virtual environmental drivers with a given temporal autocorrelation. It furthers allow to simulate specific data properties of fossil pollen datasets, such as sediment accumulation rate, and depth intervals between consecutive pollen samples. The simulation outcomes are useful to better understand the role of plant traits, niche properties, and climatic variability in defining the shape of pollen curves.

## Installation

You can install the released version of virtualPollen from [GitHub](/~https://github.com/BlasBenito/virtualPollen) or soon from [CRAN](https://CRAN.R-project.org) with:


```{r, eval=FALSE}
#from GitHub (development version)
library(devtools)
install_github("blasbenito/virtualPollen")

#from CRAN
install.packages("virtualPollen")
```

```{r, message=FALSE, warning=FALSE, results="hide"}
library(virtualPollen)
library(viridis)
```


## Workflow

The basic workflow consists of the following steps:

**1.** **Generate virtual drivers** with *simulateDriverS*. The user can define a random seed, a time vector, length of the autocorrelation structure of the driver, minimum and maximum values, and names of the drivers. Note that the final length of the temporal autocorrelation is only approximate, and that **time** runs from left to right, meaning that older samples have lower time values.

```{r, fig.width=9, fig.height=7}
#generating two drivers with different autocorrelation lengths
myDrivers <- simulateDriverS(
 random.seeds=c(60, 120),
 time=1:5000,
 autocorrelation.lengths=c(200, 600),
 output.min=c(0,0),
 output.max=c(100, 100),
 driver.names=c("A", "B"),
 filename=NULL
)

#checking output
str(myDrivers)

#note that individual drivers can be created
#with the function simulateDriver as well
```

**2.** **Define the traits of the virtual taxa**. Note that the dataframe with the parameters can be either filled with numeric vectors, or manually, by using the function [editData](https://CRAN.R-project.org/package=editData). Please, check the help of the function *parametersDataframe* and the vignette to better understand the meaning of the traits.

```{r, fig.width=9, fig.height=4}
#generating template of the dataframe
myParameters <- parametersDataframe(rows=2)

#checking columns in parameters
str(myParameters)

#filling it with vectors
myParameters[1,] <- c("Species 1", 50, 10, 10, 0.5, 0, 100, 2000, 0.5, 0.5, 75, 10, 75, 15, 200, 600)
myParameters[2,] <- c("Species 2", 1000, 60, 2, 0.05, 0, 100, 2000, 0.5, 0.5, 50, 10, 50, 15, 200, 600)

#fixing column types
myParameters <- fixParametersTypes(x = myParameters)

#visualizing main parameters
parametersCheck(parameters = myParameters, 
                species = "all",
                drivers = myDrivers)

```

**3.** **Simulate population dynamics and pollen productivity** of the virtual taxa.

```{r, figh.width=9, fig.height=9}
#simulating population dynamics
mySimulation <- simulatePopulation(parameters = myParameters,
                                   species = "all",
                                   drivers = myDrivers)

#mySimulation is a list with two dataframes
#let's look at the first one
mySimulation.df <- mySimulation[[1]]
str(mySimulation.df)
#note that "Time" has negative values
#they define a burn-in period used
#to warm-up the simulation.
#can be removed with:
mySimulation.df <- mySimulation.df[mySimulation.df$Time > 0, ]


#plotting the simulation
plotSimulation(simulation.output = mySimulation,
               line.size = 0.4)

```

```{r, fig.width=9, fig.height=4}
#comparing pollen curves for a given time
compareSimulations(simulation.output = mySimulation,
                   species = "all",
                   columns = c("Suitability", "Pollen"),
                   time.zoom = c(2000, 3000),
                   text.size = 14)
```

**4.** **Generate a virtual accumulation rate**. Note that virtual accumulation rates generated by the function *simulateAccumulationRate* are not intended to be realistic (similar to existing accumulation rate curves), but to provide the means to distort the pollen data in similar ways as real accumulation rate curves do. Still, the users can use their own accumulation rates, as long as they have the same estructure as the output of *simulateAccumulationRate*.

```{r, fig.width=9, fig.height=2}

#simulationg accumulation rate
myAccumulationRate <- simulateAccumulationRate(
 seed = 50,
 time = 1:5000,
 output.min = 10,
 output.max = 40,
 direction = -1,
 plot = TRUE
 )

#checking the output
str(myAccumulationRate)
#all samples in mySimulation with the same grouping variable
#are to be aggregated into the same centimeter (in the next step)
```

**5.** Apply the **virtual accumulation rate** to the simulation, and sample the outcome at different **depth intervals**. 

```{r}
#applying accumulation rate and
#sampling at: every cm, 2 cm and 3 cm between samples
mySimulation.aggregated <- aggregateSimulation(
  simulation.output = mySimulation,
  accumulation.rate = myAccumulationRate,
  sampling.intervals = c(2,4)
)

#the output is a matrix-like list
#rows are simulated taxa
#first column is the original data
#second column is the data aggregated by the accumulation rate
#rest of columns represent sampling intervals

#comparing data of the first row
plot(mySimulation.aggregated[[1,1]]$Time, 
     mySimulation.aggregated[[1,1]]$Pollen,
     type = "l", 
     xlim = c(1000, 2000),
     ylim = c(0, 1200),
     ylab = "Pollen",
     xlab = "Time (years)",
     col=viridis(4)[4])
lines(mySimulation.aggregated[[1,2]]$Time, 
     mySimulation.aggregated[[1,2]]$Pollen,
     col=viridis(4)[3])
lines(mySimulation.aggregated[[1,3]]$Time, 
     mySimulation.aggregated[[1,3]]$Pollen,
     col=viridis(4)[2])
lines(mySimulation.aggregated[[1,4]]$Time, 
     mySimulation.aggregated[[1,4]]$Pollen,
     col=viridis(4)[1])
```

**6.** Optional: **re-interpolate the data into a regular time grid**. Comparing the different pollen curves through methods such as regression requires them to be in the same temporal grid (a.k.a, have the same number of cases). The function *toRegularTime* handles that seamlessly.

```{r, message=FALSE}
#interpolating at 50 years resolution.
mySimulation.regular <- toRegularTime(
  x = mySimulation.aggregated,
  time.column = "Time",
  interpolation.interval = 50,
  columns.to.interpolate = c("Suitability", "Pollen")
)

#comparing interpolated data
plot(mySimulation.regular[[1,1]]$Time, 
     mySimulation.regular[[1,1]]$Pollen,
     type = "l", 
     xlim = c(1000, 2000),
     ylim = c(0, 1200),
     ylab = "Pollen",
     xlab = "Time (years)",
     col=viridis(4)[4])
#note that this function modifies the original dataset!
lines(mySimulation.regular[[1,2]]$Time, 
     mySimulation.regular[[1,2]]$Pollen,
     col=viridis(4)[3])
lines(mySimulation.regular[[1,3]]$Time, 
     mySimulation.regular[[1,3]]$Pollen,
     col=viridis(4)[2])
lines(mySimulation.regular[[1,4]]$Time, 
     mySimulation.regular[[1,4]]$Pollen,
     col=viridis(4)[1])
```