Fix reproducibility, refactor `simulate_data.py`, use functions in tests

## Problem

  Seems like `causalpy/data/simulate_data.py` module has some reproducibility issues and a bit of refactoring that could be done

  ### Reproducibility

  The module declares a seeded RNG but doesn't use it consistently:

  ```python
  rng = np.random.default_rng(RANDOM_SEED)  # Declared on line 27, only used once

  # Most functions use unseeded random:
  norm(0, 0.25).rvs(N)           # scipy.stats uses global numpy state
  np.random.choice(2, size=N)     # Uses global numpy state
```

  Result: Functions produce different data each run. Generated CSV files cannot be reproduced.

  CSV Usage:

  - Many CSVs committed to git
  - Cannot regenerate them deterministically

## Proposed Solution

1. Add seed parameter to all generation functions
2. Replace norm().rvs() with rng.normal(), dirichlet().rvs() with rng.dirichlet(), etc.
3. Delete generated CSV files; use pytest fixtures instead
4. Update tests to generate data dynamically
5. Fix bug: create_series() ignores length_scale parameter (line 488)
6. Reduce duplication (lines 87-93: repeated function calls)
7. Other light touch refactoring (separation of responsibility in functions, reduce LOC on `_smoothed_gaussian_random_walk` (lines 87-5))


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix reproducibility, refactor `simulate_data.py`, use functions in tests #545

Problem

Reproducibility

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix reproducibility, refactor simulate_data.py, use functions in tests #545

Description

Problem

Reproducibility

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Fix reproducibility, refactor `simulate_data.py`, use functions in tests #545