--- title: "Getting Started with EmpiricalDynamics" author: "EmpiricalDynamics Package" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with EmpiricalDynamics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5 ) ``` # Introduction The `EmpiricalDynamics` package provides a complete workflow for discovering differential equations from time series data. This approach, sometimes called "equation discovery" or "symbolic regression for dynamics," allows researchers to uncover the mathematical laws governing observed phenomena. The package implements a six-step workflow: 1. **Preprocessing**: Compute numerical derivatives from noisy time series 2. **Exploration**: Visual analysis to identify functional relationships 3. **Symbolic Search**: Discover equation structure automatically 4. **Residual Analysis**: Construct stochastic differential equations (SDEs) 5. **Validation**: Cross-validation, trajectory simulation, qualitative checks 6. **Output**: Generate publication-ready reports and LaTeX equations # Installation ```{r, eval=FALSE} # Install from local source install.packages("EmpiricalDynamics", repos = NULL, type = "source") # Or using devtools devtools::install_github("your-repo/EmpiricalDynamics") ``` # Quick Start Example Let's discover the equation governing logistic population growth. ## Step 1: Load Data and Compute Derivatives ```{r quickstart-prep, eval=FALSE} library(EmpiricalDynamics) # Load example data logistic_growth <- load_example_data("logistic_growth") # View the data head(logistic_growth) # Compute derivative using Total Variation Regularization (recommended) deriv <- compute_derivative( logistic_growth$Z, time = logistic_growth$time, method = "tvr", lambda = "auto" ) # Add derivative to data logistic_growth$dZ <- deriv$derivative # Plot diagnostic plot(deriv) ``` ## Step 2: Explore the Data ```{r quickstart-explore, eval=FALSE} # Explore relationships between dZ/dt and Z exploration <- explore_dynamics( data = logistic_growth, response = "dZ", predictors = "Z" ) # View suggestions for functional forms print(exploration$suggestions) ``` ## Step 3: Fit the Equation Based on theory, logistic growth follows: $\frac{dZ}{dt} = rZ(1 - Z/K)$ ```{r quickstart-fit, eval=FALSE} # Fit the theoretical equation equation <- fit_specified_equation( "r * Z * (1 - Z / K)", data = logistic_growth, derivative_col = "dZ", start = list(r = 0.5, K = 100) ) # View results summary(equation) print(coef(equation)) ``` ## Step 4: Validate the Model ```{r quickstart-validate, eval=FALSE} # Run comprehensive validation validation <- validate_model( equation = equation, data = logistic_growth, derivative_col = "dZ", variable = "Z", cv_folds = 5 ) print(validation) ``` ## Step 5: Generate Output ```{r quickstart-output, eval=FALSE} # Get LaTeX equation latex_eq <- to_latex(equation, variable = "Z") cat(latex_eq) # Generate full report (using tempdir() per CRAN policy) generate_report( results = list( data = logistic_growth, equation = equation, validation = validation ), output_file = file.path(tempdir(), "logistic_analysis"), format = "markdown" ) ``` # Detailed Workflow ## A. Numerical Differentiation Computing reliable derivatives from noisy data is critical. The package offers several methods: ### Total Variation Regularization (TVR) **Recommended for most applications**, especially economic data with trends and structural breaks. ```{r tvr-example, eval=FALSE} deriv_tvr <- compute_derivative( y = data$Z, time = data$time, method = "tvr", lambda = "auto" # Automatic regularization selection ) # Plot diagnostic: shows original, derivative, reconstruction, residuals plot_tvr_diagnostic(deriv_tvr) ``` TVR solves the optimization problem: $$\min_{\dot{Z}} \left\| Z - \int \dot{Z} \, dt \right\|_2^2 + \lambda \|\Delta \dot{Z}\|_1$$ ### Other Methods ```{r other-methods, eval=FALSE} # Savitzky-Golay filter (preserves peaks, good for smooth data) deriv_sg <- compute_derivative(y, time, method = "savgol", window = 11, order = 3) # Smoothing spline (good for high noise) deriv_spline <- compute_derivative(y, time, method = "spline", spar = 0.7) # Finite differences (simple, fast, sensitive to noise) deriv_fd <- compute_derivative(y, time, method = "fd") # Spectral (for periodic data ONLY - warns about Gibbs phenomenon otherwise) deriv_spec <- compute_derivative(y, time, method = "spectral") ``` ### Automatic Method Selection ```{r suggest-method, eval=FALSE} # Get recommendation based on data characteristics suggestion <- suggest_differentiation_method(data$Z, data$time) print(suggestion) # Compare all methods visually compare_differentiation_methods(data$Z, data$time) ``` ## B. Visual Exploration Before formal fitting, explore the data visually: ```{r exploration, eval=FALSE} # Comprehensive exploration exploration <- explore_dynamics( data = data, response = "dZ", predictors = c("Z", "X", "Y") ) # Individual plots plot_phase_1d(data, x = "Z", y = "dZ") plot_bivariate(data, x = "Z", y = "dZ") plot_surface_3d(data, x1 = "Z", x2 = "X", y = "dZ") ``` ## C. Equation Discovery ### Automated Symbolic Search ```{r symbolic-search, eval=FALSE} # Genetic algorithm search (pure R) search_result <- symbolic_search( data = data, response = "dZ", predictors = c("Z", "X"), backend = "r_genetic", max_complexity = 15, n_generations = 50, population_size = 100, n_runs = 3 ) # View Pareto front plot_pareto_front(search_result) # Select best equation best_eq <- select_equation(search_result, criterion = "bic") ``` ### Theory-Guided Fitting ```{r theory-guided, eval=FALSE} # Fit a specific functional form equation <- fit_specified_equation( "alpha + beta * Z + gamma * Z^2 + delta * X", data = data, derivative_col = "dZ", method = "levenberg-marquardt" # Robust optimization ) summary(equation) ``` ### Julia Backend (Optional) For large-scale symbolic regression, use the Julia backend: ```{r julia-backend, eval=FALSE} # Setup Julia (one-time) setup_julia_backend() # Run with Julia's SymbolicRegression.jl search_result <- symbolic_search( data = data, response = "dZ", predictors = c("Z", "X"), backend = "julia", max_complexity = 25 ) ``` ## D. Residual Analysis and SDE Construction ### Diagnostics ```{r residual-diag, eval=FALSE} # Compute residuals resid <- compute_residuals(equation, data, "dZ") # Run diagnostic tests diag <- residual_diagnostics(resid, data) print(diag) plot(diag) ``` The diagnostics include: - **Ljung-Box test**: Autocorrelation in residuals - **ARCH-LM test**: Conditional heteroscedasticity - **Breusch-Pagan test**: Heteroscedasticity vs predictors - **Jarque-Bera test**: Normality - **Runs test**: Randomness ### Modeling the Diffusion ```{r diffusion, eval=FALSE} # Model conditional variance var_model <- model_conditional_variance( resid, data = data, predictors = c("Z"), method = "linear" # or "quadratic", "symbolic" ) # Construct SDE: dZ = f(Z,X) dt + g(Z) dW sde <- construct_sde( drift = equation, diffusion = var_model, variable = "Z" ) print(sde) ``` ### Iterative GLS Estimation When heteroscedasticity is significant, use iterative GLS: ```{r iterative-gls, eval=FALSE} # Iterative estimation for unbiased drift coefficients sde <- estimate_sde_iterative( drift_formula = "a + b * Z + c * Z^2", data = data, derivative_col = "dZ", diffusion_formula = "sigma * (1 + tau * abs(Z))", max_iter = 20, tolerance = 1e-4 ) # Check convergence print(sde$converged) print(sde$iterations) ``` ## E. Validation ### Cross-Validation ```{r cv, eval=FALSE} # Block cross-validation (recommended for time series) cv <- cross_validate( equation, data = data, derivative_col = "dZ", k = 5, method = "block" ) print(cv) plot(cv) ``` ### Trajectory Simulation ```{r trajectory, eval=FALSE} # Simulate from the SDE sim <- simulate_trajectory( sde, initial_conditions = c(Z = data$Z[1]), times = data$time, n_sims = 100, method = "euler" ) # Plot with observed data plot(sim, observed_data = data) # Compare quantitatively metrics <- compare_trajectories(sim, data, "time", "Z") print(metrics) ``` ### Qualitative Behavior ```{r qualitative, eval=FALSE} # Analyze fixed points fps <- analyze_fixed_points(equation, "Z", range = c(-10, 150)) print(fps) # Bifurcation analysis (vary a parameter) bifurc <- analyze_bifurcations( equation, variable = "Z", parameter = "K", param_range = c(50, 200) ) plot(bifurc) # Check against expected behavior qual_check <- check_qualitative_behavior( equation, data, "Z", expected_features = list( n_fixed_points = 2, stability_pattern = c("unstable", "stable"), bounded = TRUE ) ) print(qual_check) ``` ### Comprehensive Validation ```{r full-validation, eval=FALSE} validation <- validate_model( equation = equation, sde = sde, data = data, derivative_col = "dZ", variable = "Z", time_col = "time", cv_folds = 5, n_sims = 100, expected_features = list(n_fixed_points = 2) ) print(validation) plot(validation) ``` ## F. Output and Reporting ### LaTeX Equations ```{r latex, eval=FALSE} # Convert to LaTeX latex <- to_latex(equation, variable = "Z", precision = 4) cat(latex) # Full SDE in LaTeX sde_latex <- to_latex(sde, variable = "Z") cat(sde_latex) ``` ### Tables ```{r tables, eval=FALSE} # Coefficient table coef_table <- coefficient_table(equation, format = "latex") cat(coef_table) # Model comparison models <- list( "Linear" = fit_specified_equation("a + b*Z", data, "dZ"), "Quadratic" = fit_specified_equation("a + b*Z + c*Z^2", data, "dZ"), "Logistic" = fit_specified_equation("r*Z*(1-Z/K)", data, "dZ") ) comparison <- model_comparison_table(models, data, "dZ", format = "markdown") cat(comparison) ``` ### Full Report ```{r report, eval=FALSE} # Collect all results results <- list( data = data, exploration = exploration, equation = equation, sde = sde, validation = validation ) # Generate markdown report (using tempdir() per CRAN policy) generate_report( results, output_file = file.path(tempdir(), "analysis_report"), format = "markdown", title = "My Dynamics Analysis" ) # Export to multiple formats (using tempdir() per CRAN policy) export_results(results, output_dir = tempdir(), formats = c("rds", "csv", "json")) # Save plots (using tempdir() per CRAN policy) save_plots(results, output_dir = tempdir(), format = c("png", "pdf")) ``` ### Analysis Summary ```{r summary, eval=FALSE} # Print summary to console print_summary(results) # Get template for new analyses (using tempdir() per CRAN policy) get_analysis_template(file.path(tempdir(), "my_analysis.R")) ``` # Best Practices ## Differentiation 1. **Start with TVR** for economic/social science data 2. Use `suggest_differentiation_method()` for guidance 3. Always plot diagnostics to verify derivative quality 4. Consider data collection frequency via `diagnose_sampling_frequency()` ## Equation Discovery 1. **Use theory when available** - `fit_specified_equation()` is more reliable 2. For exploratory analysis, use symbolic search but validate thoroughly 3. Prefer simpler equations (parsimony) 4. Run multiple searches and compare ## Validation 1. **Always use block CV** for time series (not random CV) 2. Check residual diagnostics before trusting results 3. Verify qualitative behavior matches domain knowledge 4. Use bootstrap for uncertainty quantification ## Reporting 1. Report uncertainty (standard errors or confidence intervals) 2. Include model comparison with alternatives 3. Show validation metrics prominently 4. Discuss limitations and assumptions # Further Reading - Brunton, S. L., & Kutz, J. N. (2019). *Data-Driven Science and Engineering* - Schmidt, M., & Lipson, H. (2009). Distilling free-form natural laws from experimental data. *Science*, 324(5923), 81-85. - Chartrand, R. (2011). Numerical differentiation of noisy, nonsmooth data. *ISRN Applied Mathematics*, 2011. # Session Info ```{r session-info} sessionInfo() ```