This toolbox packages a set of stochastic processes for prices and rates simulation, aiming to create a synthetic dataset for quantitative back-testing of trading strategies and asset allocations methods.
TLDR;
- Matlab SDK for stochastic simulation of stock prices and bond rates.
- Additionally, some utilities to sample data are included, such as Order flow and information-driven bars.
- Please visit the site documentation for details
Rationale
Simulating synthetic stock prices and bond rates provides an alternative back-testing method that uses history to generate datasets with statistical characteristics estimated from the observed data. This method allows back-testing on a large sample of unseen scenarios, reducing the likelihood of overfitting to a particular historical data set. Because each trading strategy needs an implementation tactic (a.k.a., trading rules) to enter, maintain, and exit the respective positions on each instrument, a simulation over thousands of different scenarios is mandatory. However, there is an implicit tradeoff.
The historical data will show the "real" state of the financial instruments based on the realized combinations of events that affect each market. Thereby, a traditional portfolio manager will design a set of rules that optimize or hedge the profits for those specific combinations of events. Therefore, an investment strategy that relies on parameters fitted solely by one combination of events is doomed to fail.
Such a framework for designing trading strategies is limited in the amount of knowledge that can incorporate. So, simulating thousands or even millions of possibles scenarios for the future will robust the way that an econometric method exploits an inefficiency in the market.
Based on the previous postulate, I have created a toolbox that packages different stochastic processes (a.k.a, valuation methods) for back-testing synthetic data.
The processes that were for this version of the toolbox are:
Stock prices
- Brownian Motion
- Geometric Brownian motion
- Merton model
- Heston model
Bond Rates
- Vasicek model
- Cox Ingersoll Ross model
Without further due, let's briefly dive into each process and how you can use the toolbox in your Matlab session.
Introduction to the Matlab class
All the processes recreate the price path for an asset based on the user's configuration. As such, the user can initialize the class with the following command. Please be aware that the user should enter the parameters as name-value arguments for the definition of the class.
%{ Creating the object that has the initialized class, this is read as follows: Generate 5 securities with 252 datapoints each, where the time step between each observation is 1, and the start price for the securities is $100. %} sim = randomProcesses("n", 5, "T", 252, "h", 1, "s0", 100);
In this case, each name-value argument is defined as follows:
- T: number of observations to generate for each time series.
- h: the size of the step.
- n: number of paths to generate.
- s0: initial price to the state for each path to generate, be aware that if you want to simulate rates, this number is considered a percentage (e.g., 30 = 0.3 in the rates environment).
- sigma: trading intensity. This parameter is used for the volume generation process and is not related to the associated volatility of each instrument.
If the user wants a rapid check of the documentation for each process, he/she can input the following command in the Matlab console.
doc("randomProcesses")
Stochastic Methods implemented
Stock Prices
Brownian Motion
This method implements a discrete-time stochastic process for a Brownian motion that satisfies the following stochastic differential equation (SDE):
$$dX_t =\mu X_t \mathrm{dt}+\sigma \;X_t \;dW_t$$ $$X\left(0\right)=X_0$$The Euler–Maruyama method is used for the numerical solution of the SDE and has the following recurrence:
$$\begin{array}{l} X\left(k+1\right)=X\left(k\right)+\mu X\left(k-1\right)\Delta t+\sigma X\left(k-1\right)W\;\\ \mathrm{where}\;\\ W=Z\left(k\right)\sqrt{\Delta t\;}\\ \;Z\left(k\right)\;\mathrm{is}\;\mathrm{white}\;\mathrm{noise} \end{array}$$The name-value arguments for the method are:
- mu(float): Historical means of returns
- sigma(float): Historical volatility of returns
- sto_vol(logical): Optional argument for the helper that states if the volatility should be constant or stochastic in the data generation process. Default is FALSE for this process.
% Generate the prices paths and save the variable brownian_prices = sim.brownian_prices("mu", 0.04, "sigma", 0.15); % plot the results plot(brownian_prices) title('Assets simulated prices for Brownian Motion') ylabel('Prices') xlabel('Time step')
Geometric Brownian Motion
- mu(float): Historical means of returs
- sigma(float): Historical volatility of returns
- sto_vol(logical): Optional argument for the helper that states if the volatility should be constant or stochastic in the data generation process. Default is TRUE for this process.
% Generate the prices paths and save the variable gbm_prices = sim.gbm_prices("mu", 0.04, "sigma", 0.15); % plot the results plot(gbm_prices) title('Assets simulated prices for Geometric Brownian Motion') ylabel('Prices') xlabel('Time step')
Merton’s Jump-Diffusion Model
- lambda(double): Moment of arrival of an important piece of information.
- mu(double): Historical mean of returns.
- sigma(double): Historical volatility of returns.
- sto_vol(logical): Optional argument for the helper that states if the volatility should be constant or stochastic in the data generation process. Default is TRUE for this process.
% Generate the prices paths and save the variable % The arrival of critical information will arrive every 30 % iterations until the end of the data points. merton_prices = sim.merton_prices("mu", 0.04, "sigma", 0.15, 'lambda', 30); % plot the results plot(merton_prices) title('Assets simulated prices for the Merton’s Jump-Diffusion model') ylabel('Prices') xlabel('Time step')
Heston Model
- rf(double): Risk-free interest rate, theoretical rate on an asset carrying no risk. Default value is 0.02
- theta(double): Long term price variance. Default value is 1
- k(double): Rate reversion to the long term variance. Default value is 0.5
- sigma(double): Historical volatility of returns. Default value is 1
- sto_vol(logical): Optional argument for the helper that states if the volatility should be constant or stochastic in the data generation process. Default is FALSE for this process.
% Generate the prices paths and save the variable heston_prices = sim.heston_prices('rf', 0.01, 'theta', 0.5, ... 'k', 0.8, 'sigma', 0.2); % plot the results plot(heston_prices) title('Assets simulated prices for the Heston model') ylabel('Prices') xlabel('Time step')
Bond Rates
Vasicek interest rate model
- mu(double): Long term mean level. All future trajectories of s will evolve around a mean level μ in the long run. Default value is 0
- sigma(double): Instantaneous volatility, measures instant by instant the amplitude of randomness entering the system. Higher σ implies more randomness. Default value is 1
- lambda(double): Speed of reversion. λ characterizes the velocity at which such trajectories will regroup around μ in time. Default value is 0.5
- sto_vol(logical): Optional argument for the helper that states if the volatility should be constant or stochastic in the data generation process. Default is FALSE for this process.
% Create the object for the rate series % The following object can be read as follows: Create 5 instruments with % 252 observations each, were the time step between the observations is 1 % and the initial rate is 0.02 (i.e., 2%) sim2 = randomProcesses('n', 5, 'T', 252, 'h', 1, 's0', 2); % Generate the prices paths and save the variable vas_rates = sim2.vas_rates("mu", 0.018, "sigma", 0.03, 'lambda', 0.9); % plot the results plot(vas_rates) title('Rates simulated for the Vasicek interest rate model') ylabel('Rates') xlabel('Time step')
Cox-Ingersoll-Ross interest rate model
- mu(double): The long term means level. All future trajectories of s will evolve around a mean level μ in the long run. The default value is 0
- sigma(double): Instantaneous volatility measures instant by instant the amplitude of randomness entering the system. Higher σ implies more randomness. The default value is 1
- lambda(double): Speed of reversion. λ characterizes the velocity at which such trajectories will regroup around μ in time. The default value is 0.5
- sto_vol(logical): Optional argument for the helper states if the volatility should be constant or stochastic in the data generation process. Default is false for this process.
% Generate the prices paths and save the variable cir_rates = sim2.cir_rates("mu", 0.018, "sigma", 0.03, 'lambda', 0.9); % plot the results plot(cir_rates) title('Rates simulated for the Cox-Ingersoll-Ross interest rate model') ylabel('Rates') xlabel('Time step')
Utilities
Order Flow
- eta(double): Proportion of informed trade. The default value is 0.1
- M(double): Proportion of liquidity seekers. The default value is 0.3
- market_prices(matrix): tick prices for a financial instrument.
volumes = sim.order_flow("eta", 0.15, "market_prices", heston_prices(:, 1)); bar(volumes,'EdgeColor','none'); ylabel({'Volume'}); xlabel({'Time Step'}); title({'Generated Volumes for a Heston model'});
Information-Driven Bars
Tick Imbalance Bars
Consider a sequence of ticks ${\left\lbrace \left(p_t \;,v_t \;\right)\right\rbrace}_{t=1,\ldotp \ldotp \ldotp ,T}$ , where $p_t$ is the price associated with tick $t$ and $v_t$ is the volume associated with tick $t$. The so-called tick rule defines a sequence ${\left\lbrace b_t \right\rbrace }_{t=1,\ldotp \ldotp \ldotp ,T}$ where:
$$b_t =\left\lbrace \begin{array}{ll}b_{t-1} & \textrm{if}\;\Delta p_t =0\\\frac{\left|\Delta p_t \right|}{\Delta p_t } & \textrm{if}\;\Delta p_t \not= 0\end{array}\right.$$with $b_t \in \left\lbrace -1,1\right\rbrace$, and the boundary condition $b_0$ is set to match the terminal value $b_T$ from the immediately preceding bar. The idea behind tick imbalance bars (TIB's) is to sample bars whenever tick imbalances exceed our expectations. We wish to determine the tick index, $T$, such that the accumulation of signed ticks (signed according to the tick rule) exceeds a given threshold. Next, let us discuss the procedure to determine $T$.
First, we define the tick imbalance at time $T$ as:
$$\theta_t =\sum_{t=1}^T b_t$$Second, we compute the expected value of $\theta_T$ at the beginning of the bar, $E_0 \left\lbrack \theta_T \right\rbrack =E_0 \left\lbrack T\right\rbrack \left(P\left\lbrack b_t =1\right\rbrack -P\left\lbrack b_t =-1\right\rbrack \right)$, where $E_0 \left\lbrack T\right\rbrack$ is the expected size of the tick bar, $P\left\lbrack b_t =1\right\rbrack$ is the unconditional probability that a tick is classified as a buy, and $P\left\lbrack b_t =-1\right\rbrack$ is the unconditional probability that a tick is classified as a sell. Since $P\left\lbrack b_t =1\right\rbrack +P\left\lbrack b_t =-1\right\rbrack =1$, then $E_0 \left\lbrack \theta_T \right\rbrack =E_0 \left\lbrack T\right\rbrack \left(2P\left\lbrack b_t =1\right\rbrack -1\right)$
In practice, we can estimate $E_0 \left\lbrack T\right\rbrack$ as an exponentially weighted moving average of $T$ values from prior bars, and $\left(2P\left\lbrack b_t =1\right\rbrack -1\right)$ as an exponentially weighted moving average of $b_t$ values from prior bars.
Third, we define a tick imbalance bar (TIB) as a $T^*$ - contiguous subset of ticks such that the following condition is met:

where the size of the expected imbalance is implied by $\left|2P\left\lbrack b_t =1\right\rbrack -1\right|$. When $\theta_T$ is more imbalanced than expected, a low $T$ will satisfy these conditions. Accordingly, TIB's are produced more frequently under the presence of informed trading (asymmetric information that triggers one-side trading). In fact, we can understand TIBs as buckets of trades containing equal amounts of information (regardless of the volumes, prices, or ticks traded).
The name-value arguments for the method are:
- ticks(matrix): tick market prices for security with the corresponding volumes. There is no default value for this parameter.
- window(double): number of prior observations to use for the sampling. The default value is 15.
Usage:
% all create a matrix of prices and volumes tick_prices = [heston_prices(:, 1) volumes]; % the output is an OHLCV dataset tib = sim.tib("ticks", tick_prices, "window", 20); % ploting the resutls priceandvol(tib);
Volume and Dollar Imbalance Bars

% all create a matrix of prices and volumes tick_prices = [heston_prices(:, 1) volumes]; % the output is an OHLCV dataset - Dollar Imbalance Bars. If the user wants % the Volume information bars, please change the method name to vib. dib = sim.dib("ticks", tick_prices, "window", 20); % ploting the resutls priceandvol(dib);
Comentarios
Publicar un comentario