GitHub - MonarchofCoding/EvoCoT-Prototype

Evo-CoT: Evolutionary Optimization of Chain-of-Thoughts

This repository contains code for the Evo-CoT framework, which uses staged evolutionary algorithms to generate, align, and correct chain-of-thought (CoT) reasoning exemplars. The framework is designed to explore novel reasoning patterns, refine them for problem alignment, and select top-quality CoTs using LLM-based correction.

Installation

1.1 Python Version

Python >= 3.10 recommended.

1.2 Dependencies

Install required packages using pip:

pip install -r requirements.txt

Key dependencies include:

numpy – numerical computations

scipy – scientific operations (optional)

matplotlib – plotting results

tqdm – progress bars

transformers – LLM alignment and evaluation (for Stage 2/3)

torch – PyTorch backend for LLM inference

(Optional: jupyter or ipython for interactive experiments)

Dataset / Population Initialization

The framework requires an initial CoT population stored as JSON (population.json).

Each entry must include:

problem : problem statement

cot : initial chain-of-thought

answer : ground truth answer (optional for exploration)

Download Instructions:

If using a benchmark dataset (e.g., GSM8K, MATH, or custom problems), preprocess into the above JSON format.

Example JSON snippet:

[ { "problem": "In a class of 40 students, 80% have puppies. 25% of those also have parrots. How many have both?", "cot": "First calculate the number of students with puppies. Then compute the subset with parrots.", "answer": "8" }, ... ]

Running Experiments

3.1 Stage 1: Exploration

python stage1_exploration.py

Generates diverse CoTs using meta-heuristics, semantic-preserving mutations, and crossovers.

Logs fitness, diversity, and generation statistics.

3.2 Stage 2: Alignment

python stage2_alignment.py

Aligns top Stage 1 CoTs to their respective problems using LLM guidance.

No evolution occurs here; purely alignment and structural refinement.

3.3 Stage 3: Correction & Ranking

python stage3_correction.py

Uses LLM-based scoring to assign correctness fitness.

Ranks and selects Top-K CoTs.

Reproducibility Notes

Random seeds are set in all stages, but LLM-based alignment may introduce non-determinism.

Stage 1 results can vary slightly depending on mutation and crossover operations.

Save population snapshots (population_stage1_genX.json) to resume experiments or compare intermediate results.

Cost & Computational Considerations

Stage 1 with 2,000 population × 80 generations is computationally intensive (~7,400 total fitness evaluations).

LLM-based alignment and correction (Stage 2/3) can be GPU-accelerated for efficiency.

Suggested workflow for budgeted experiments:

Run smaller populations or fewer generations for prototype testing.
Run full-scale experiments on high-memory GPU nodes for final results.

Track elapsed time, mutation/crossover counts, diversity to monitor experiment efficiency.

Plotting Results

Use matplotlib to visualize fitness trends across generations:

import matplotlib.pyplot as plt

plt.plot(generations, avg_fitness, label='Average Fitness', color='blue') plt.plot(generations, best_fitness, label='Best Fitness', color='red') plt.xlabel('Generation') plt.ylabel('Fitness') plt.title('Stage 1 Fitness Evolution') plt.legend() plt.savefig('stage1_fitness_plot.png') plt.show()

Upload .png or .pdf images to Overleaf for paper figures.

If you want, I can also prepare a requirements.txt and a sample population.json ready for your Overleaf/experiment so you can start immediately.

Do you want me to do that?

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
accuracy_cache.json		accuracy_cache.json
aligned_population.json		aligned_population.json
allignment.py		allignment.py
evo 1.py		evo 1.py
evolution_metrics.json		evolution_metrics.json
evolution_trend.png		evolution_trend.png
fitness_stage1.py		fitness_stage1.py
main.py		main.py
metrics_stage1.json		metrics_stage1.json
operators.py		operators.py
population.json		population.json
population.py		population.py
population_stage1_end.json		population_stage1_end.json
population_stage1_gen10.json		population_stage1_gen10.json
population_stage1_gen15.json		population_stage1_gen15.json
population_stage1_gen20.json		population_stage1_gen20.json
population_stage1_gen25.json		population_stage1_gen25.json
population_stage1_gen30.json		population_stage1_gen30.json
population_stage1_gen35.json		population_stage1_gen35.json
population_stage1_gen40.json		population_stage1_gen40.json
population_stage1_gen45.json		population_stage1_gen45.json
population_stage1_gen5.json		population_stage1_gen5.json
population_stage1_gen50.json		population_stage1_gen50.json
population_stage1_gen55.json		population_stage1_gen55.json
population_stage1_gen60.json		population_stage1_gen60.json
population_stage1_gen65.json		population_stage1_gen65.json
population_stage1_gen70.json		population_stage1_gen70.json
population_stage1_gen75.json		population_stage1_gen75.json
population_stage1_gen80.json		population_stage1_gen80.json
prepare_population.py		prepare_population.py
prompting.py		prompting.py
ranked_cots.json		ranked_cots.json
results.py		results.py
stage1_metrics.json		stage1_metrics.json
stage1_population.json		stage1_population.json
stage1_topk.json		stage1_topk.json
stage2_metrics.json		stage2_metrics.json
stage3.py		stage3.py
stage3_final_population.json		stage3_final_population.json
top_20_ranked_aligned_cots.json		top_20_ranked_aligned_cots.json
validation.json		validation.json
values.py		values.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

MonarchofCoding/EvoCoT-Prototype

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages