Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Local and Remote Repositories #179

Merged
merged 19 commits into from
Jul 30, 2024
Merged

Conversation

RexWzh
Copy link
Contributor

@RexWzh RexWzh commented Jul 9, 2024

Hi,

Thanks for the awesome project!

This PR enhances the LeanGitRepo class to support local and remote repositories, introducing a repo_type property (github, local, remote), simplifying the _to_commit_hash method, transitioning lean_version to a string for clarity, and adding tests for the new repository configurations.

A Quick Review

We transition from pygithub objects to gitpython for managing repositories.

For compatibility, a new propertyrepo_type is introduced in the LeanGitRepo class, supporting three values: github, local, and remote. I've adapted the APIs to accommodate these types:

Type LeanGitRepo.repo LeanGitRepo.url
github GitHub object GitHub URL
local Git object (local) Local repository path
remote Git object (local) Remote repository URL

It's important to note that while the GitHub object connects to an online repo, the Git object interacts directly with a local git directory.

Please let me know if any further adjustments are needed.

@RexWzh
Copy link
Contributor Author

RexWzh commented Jul 9, 2024

I’m planning to update the tracing method based on these settings next, just to be sure it aligns with the developers' preferences, and also, to avoid complicating the review process.

Looking forward to your feedback.

relate issue: #153

@Iheadx
Copy link

Iheadx commented Jul 10, 2024

When I use the Local Repo in this code, It doesn't work as I expected. It seems that the dojo.py still need to be modified.

repo = LeanGitRepo("lean4-example", "7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f")
theorem = Theorem(repo, "Lean4Example.lean", "hello_world")

with Dojo(theorem) as (dojo, init_state):
  print(init_state)

When using Dojo(theorem), an unknown URL type error will be reported.

@RexWzh RexWzh marked this pull request as draft July 10, 2024 10:17
@RexWzh
Copy link
Contributor Author

RexWzh commented Jul 10, 2024

It seems that the dojo.py still need to be modified

These are not implemented yet.

I’m planning to update the tracing method based on these settings next.

I have only modified the LeanGitRepo class to ensure backward compatibility when initializing the Lean repository. No changes have been made to other components.

@RexWzh
Copy link
Contributor Author

RexWzh commented Jul 10, 2024

The basic idea is to replace the GitHub object with a git object. Additionally, we need to change the tracing process, which currently retrieves information via the GitHub API, to a direct way. There are intricate details to adjust, and it may take some time to finalize everything. For now, I've marked it as a draft PR.

@RexWzh
Copy link
Contributor Author

RexWzh commented Jul 10, 2024

I am updating the tracing and interaction code by python debugging and accompanying tests.

For the sake of a smoother review process, it may be prudent to handle the integration in smaller, manageable segments. If it isn't too cumbersome, we can also review everything after all updates are made. I will periodically write tests to ensure compatibility and prevent the introduction of bugs.

@RexWzh
Copy link
Contributor Author

RexWzh commented Jul 10, 2024

Hi @Iheadx @darabos , please take a look at this. ;)

An example of the test:

repo = LeanGitRepo(url=LOCAL_REPO_PATH, commit="main")
assert repo.repo_type == 'local'
theorem = Theorem(repo, "Lean4Example.lean", "hello_world")
# initial state
dojo, state_0 = Dojo(theorem).__enter__()
assert state_0.pp == 'a b c : Nat\n⊢ a + b + c = a + c + b'
# state after applying a tactic
state_1 = dojo.run_tac(state_0, "rw [add_assoc]")
assert state_1.pp == 'a b c : Nat\n⊢ a + (b + c) = a + c + b'
# state after applying the 'sorry' tactic
assert dojo.run_tac(state_1, "sorry") == ProofGivenUp()
# concluding the proof
final_state = dojo.run_tac(state_1, "rw [add_comm b, ←add_assoc]")
assert isinstance(final_state, ProofFinished)

All test cases have passed on my machine, ubuntu22.

image

Network error for one case:

image

@RexWzh RexWzh marked this pull request as ready for review July 10, 2024 23:05
@yangky11
Copy link
Member

Thank you. This is really helpful! I'll be able to take a look later this week or in the weekend!

@LuoKaiGSW
Copy link

hi, @RexWzh Thanks for your contribution!
repo = LeanGitRepo(url=LOCAL_REPO_PATH, commit="main") assert repo.repo_type == 'local'
This means that I can pull the lean4-example repository to my local machine and modify the Lean files as needed without needing to push the changes back to the remote repository, right? The way to do this is by specifying LOCAL_REPO_PATH as the path to the local repository.

@RexWzh
Copy link
Contributor Author

RexWzh commented Jul 20, 2024

Yes @LuoKaiGSW , you can use a local Git repository by passing the path in the url parameter or using LeanGitRepo.from_path.

See test examples here.

import os
# Avoid using remote cache
os.environ['DISABLE_REMOTE_CACHE'] = 'true'
lean4_example_url = 'https://github.com/yangky11/lean4-example'
local_repo_path = "lean4-example"
clean_clone_and_checkout(lean4_example_url, local_repo_path, 'main')

local_repo = LeanGitRepo(url=local_repo_path, commit="main")
from_path_repo = LeanGitRepo.from_path(local_repo_path)

In general, you don't need to connect to GitHub at all when working with repositories of type local or remote.

However, if the lean version cannot be obtained from the toolchain file, it'll use the get_lean4_commit_from_config method, which interacts with GitHub:

config = self.get_config("lean-toolchain")
toolchain = config["content"]
m = _LEAN4_VERSION_REGEX.fullmatch(toolchain.strip())
if m is not None:
    lean_version = m["version"]
else:
    lean_version = get_lean4_version_from_config(toolchain)

A note on repository types:

The git.Repo object requires a path to the actual Git repository, whereas the github.Repository object only needs a GitHub URL. Thus, the url_to_repo function returns different objects based on the repo_type:

  • github: Returns a github.Repository object.
  • remote: Clones the Git repository to a temporary directory and returns a git.Repo object.
  • local: Copies the Git repository to a temporary directory and returns a git.Repo object.

image

@LuoKaiGSW
Copy link

LuoKaiGSW commented Jul 22, 2024

Yes @LuoKaiGSW , you can use a local Git repository by passing the path in the url parameter or using LeanGitRepo.from_path.

See test examples here.

import os
# Avoid using remote cache
os.environ['DISABLE_REMOTE_CACHE'] = 'true'
lean4_example_url = 'https://github.com/yangky11/lean4-example'
local_repo_path = "lean4-example"
clean_clone_and_checkout(lean4_example_url, local_repo_path, 'main')

local_repo = LeanGitRepo(url=local_repo_path, commit="main")
from_path_repo = LeanGitRepo.from_path(local_repo_path)

In general, you don't need to connect to GitHub at all when working with repositories of type local or remote.

However, if the lean version cannot be obtained from the toolchain file, it'll use the get_lean4_commit_from_config method, which interacts with GitHub:

config = self.get_config("lean-toolchain")
toolchain = config["content"]
m = _LEAN4_VERSION_REGEX.fullmatch(toolchain.strip())
if m is not None:
    lean_version = m["version"]
else:
    lean_version = get_lean4_version_from_config(toolchain)

A note on repository types:

The git.Repo object requires a path to the actual Git repository, whereas the github.Repository object only needs a GitHub URL. Thus, the url_to_repo function returns different objects based on the repo_type:

  • github: Returns a github.Repository object.
  • remote: Clones the Git repository to a temporary directory and returns a git.Repo object.
  • local: Copies the Git repository to a temporary directory and returns a git.Repo object.

image

hi, @RexWzh Thanks for your reply!
I pulled the code from your repository and installed it using 'pip install .'. However, I encountered some errors during testing. Could you help me take a look at what the issue might be?
code

from lean_dojo import *
repo = LeanGitRepo.from_path("/Users/luokai/Desktop/lk/project/lean_project/lean4-example")

error

2024-07-22 11:57:25.530 | INFO     | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/Users/luokai/Desktop/lk/project/lean_project/lean4-example', commit='81b57776e7a7dd1270b62598750e9fddc2470127')
✔ [1/2] Built Lean4Example
Build completed successfully.
2024-07-22 11:57:31.562 | INFO     | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/Users/luokai/Desktop/lk/project/lean_project/lean4-example', commit='81b57776e7a7dd1270b62598750e9fddc2470127')
✔ [1/2] Built Lean4Example
Build completed successfully.
Traceback (most recent call last):
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/luokai/Desktop/lk/project/lean_project/lean_py/main.py", line 16, in <module>
    traced_repo = trace(repo)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 250, in trace
    cached_path = get_traced_repo_path(repo, build_deps)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 212, in get_traced_repo_path
    _trace(repo, build_deps)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 157, in _trace
    with launch_progressbar(dirs_to_monitor):
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 65, in launch_progressbar
    p.start()
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/Users/luokai/opt/anaconda3/envs/lean/lib/python3.10/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.
        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:
            if __name__ == '__main__':
                freeze_support()
                ...
        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

system info
MacBook-Pro.local 23.1.0 Darwin Kernel Version 23.1.0

@RexWzh
Copy link
Contributor Author

RexWzh commented Jul 22, 2024

I've tested on both mac and Ubuntu and couldn't reproduce the issue you mentioned.

Please ensure that your cloned repository is up-to-date.

Mac:

❯ sw_vers
ProductName:            macOS
ProductVersion:         14.5
BuildVersion:           23F79
❯ ls
LICENSE        docs           lean4-example  pyproject.toml src
README.md      images         mypy.ini       scripts        tests
❯ python
Python 3.10.14 (main, May  6 2024, 14:47:20) [Clang 14.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from lean_dojo import LeanGitRepo
>>> LeanGitRepo.from_path('lean4-example')
LeanGitRepo(url='/Users/wangzhihong/workspace/LEAN/LeanDojo/lean4-example', commit='7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f')
>>> LeanGitRepo('lean4-example', 'main')
LeanGitRepo(url='/Users/wangzhihong/workspace/LEAN/LeanDojo/lean4-example', commit='7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f')
>>> 

Ubuntu:

❯ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy
❯ python
Python 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lean_dojo import LeanGitRepo
>>> LeanGitRepo('lean4-example', 'main')
LeanGitRepo(url='/home/cuber/zhihong/LeanPorjects/lean4-example', commit='7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f')
>>> LeanGitRepo.from_path('lean4-example')
LeanGitRepo(url='/home/cuber/zhihong/LeanPorjects/lean4-example', commit='7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f')

tests/conftest.py Outdated Show resolved Hide resolved
@yangky11
Copy link
Member

I encounter the following error when testing this PR.

Starting from an empty cache, I ran:

from lean_dojo import *
repo = LeanGitRepo("https://github.com/leanprover-community/mathlib4", "29dcec074de168ac2bf835a77ef68bbe069194c5")
trace(repo)

It successfully traced the repo but saved the traced repo to ~/.cache/lean_dojo/tmp83kwc9x5-mathlib4-29dcec074de168ac2bf835a77ef68bbe069194c5 instead of ~/.cache/lean_dojo/leanprover-community-mathlib4-29dcec074de168ac2bf835a77ef68bbe069194c5.

Then, I exited Python, started a new session, and ran the above code again. Instead of re-using the traced repo in the cache, it started to trace the repo again.

@yangky11
Copy link
Member

I tried tracing https://github.com/yangky11/lean4-example and had a similar issue.

image

@yangky11
Copy link
Member

For comparison, this is using LeanDojo's current main branch:
image

@RexWzh
Copy link
Contributor Author

RexWzh commented Jul 24, 2024

Thanks for pointing that out. I noticed the issue, and it is fixed in the commit use tempdir for local type & resolve cache problem.

# Remove cache directory
import os
os.environ['NUM_PROCS'] = '128'
os.environ['DISABLE_REMOTE_CACHE'] = 'true'
from lean_dojo import LeanGitRepo, trace, Dojo, Theorem

repo = LeanGitRepo('/home/cuber/zhihong/LeanPorjects/leandojo/lean4-example', 'main')
trace_repo = trace(repo)

The cache file is located at:

❯ ls ~/.cache/lean_dojo
leandojo-lean4-example-7b6ecb9ad4829e4e73600a3329baeb3b5df8d23f

However, I did not run tests for this commit and later ones.

I believe it would be more effective to split these changes into smaller PRs.

#188

@LuoKaiGSW
Copy link

LuoKaiGSW commented Jul 24, 2024

I'll break this PR into smaller PRs to ensure stability at each stage and make code reviews easier.

I would like to know which branch can be used to load the local git repository. My current branch situation is as follows, and I am still encountering the same error, Thank you!

2024-07-24 07:06:03.269 | WARNING  | lean_dojo:<module>:33 - Running LeanDojo as the root user may cause unexpected issues. Proceed with caution.
2024-07-24 07:06:03.285 | INFO     | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/home/lean4-example', commit='3d3a1dd2cdaf9251de7e0575a85c702e7086db60')
Build completed successfully.
 19%|████████████████████████████████████████████▌                                                                                                                                                                                23%|████████████████████████████████████████████████████▉                                                                                                                                                                        25%|█████████████████████████████████████████████████████████▊                                                                                                                                                                   27%|███████████████████████████████████████████████████████████████▍                                                                                                                                                            100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 906/907 [03:20<00:00,  2.32it/s]
✖ [1/2] Building Lean4Repl
trace: .> LEAN_PATH=././.lake/build/lib LD_LIBRARY_PATH=././.lake/build/lib /root/.elan/toolchains/leanprover--lean4---v4.9.0-rc3/bin/lean ././././Lean4Repl.lean -R ./././. -o ././.lake/build/lib/Lean4Repl.olean -i ././.lake/build/lib/Lean4Repl.ilean -c ././.lake/build/ir/Lean4Repl.c --json
info: stderr:
resource exhausted (error code: 28, no space left on device)
error: Lean exited with code 1
Some builds logged failures:
- Lean4Repl
error: build failed
2024-07-24 07:09:41.940 | WARNING  | lean_dojo.data_extraction.trace:_trace:185 - Failed to build Lean4Repl. You may run into issues when interacting with the repo.
2024-07-24 07:09:41.946 | INFO     | lean_dojo.utils:execute:116 - 
2024-07-24 07:09:41.946 | ERROR    | lean_dojo.utils:execute:117 - fatal: not a git repository (or any of the parent directories): .git

Traceback (most recent call last):
  File "/home/lean4-example/scripts/test.py", line 19, in <module>
    traced_repo = trace(repo)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 248, in trace
    cached_path = get_traced_repo_path(repo, build_deps)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 213, in get_traced_repo_path
    traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name, build_deps)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/traced_data.py", line 1094, in from_traced_files
    repo = LeanGitRepo.from_path(root_dir)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/lean.py", line 494, in from_path
    url, commit = get_repo_info(path)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 186, in get_repo_info
    commit_msg, _ = execute(f"git log -n 1", capture_output=True)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 118, in execute
    raise ex
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 113, in execute
    res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True)
  File "/root/miniconda3/envs/lean/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'git log -n 1' returned non-zero exit status 128.

image

@yangky11 yangky11 changed the base branch from main to dev July 24, 2024 19:55
elif os.path.exists(parsed_url.path):
return "local"
else:
logger.warning(f"{url} is not a valid URL")

@cache
def get_repo_info(path: Path) -> Tuple[str, str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with working_directory(path) is still needed in this function? There is no guarantee the CWD will be the same as path when calling it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the same error as @LuoKaiGSW and think this is the reason.

@LuoKaiGSW
Copy link

LuoKaiGSW commented Jul 25, 2024

I'll break this PR into smaller PRs to ensure stability at each stage and make code reviews easier.

I would like to know which branch can be used to load the local git repository. My current branch situation is as follows, and I am still encountering the same error, Thank you!

2024-07-24 07:06:03.269 | WARNING  | lean_dojo:<module>:33 - Running LeanDojo as the root user may cause unexpected issues. Proceed with caution.
2024-07-24 07:06:03.285 | INFO     | lean_dojo.data_extraction.trace:get_traced_repo_path:209 - Tracing LeanGitRepo(url='/home/lean4-example', commit='3d3a1dd2cdaf9251de7e0575a85c702e7086db60')
Build completed successfully.
 19%|████████████████████████████████████████████▌                                                                                                                                                                                23%|████████████████████████████████████████████████████▉                                                                                                                                                                        25%|█████████████████████████████████████████████████████████▊                                                                                                                                                                   27%|███████████████████████████████████████████████████████████████▍                                                                                                                                                            100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 906/907 [03:20<00:00,  2.32it/s]
✖ [1/2] Building Lean4Repl
trace: .> LEAN_PATH=././.lake/build/lib LD_LIBRARY_PATH=././.lake/build/lib /root/.elan/toolchains/leanprover--lean4---v4.9.0-rc3/bin/lean ././././Lean4Repl.lean -R ./././. -o ././.lake/build/lib/Lean4Repl.olean -i ././.lake/build/lib/Lean4Repl.ilean -c ././.lake/build/ir/Lean4Repl.c --json
info: stderr:
resource exhausted (error code: 28, no space left on device)
error: Lean exited with code 1
Some builds logged failures:
- Lean4Repl
error: build failed
2024-07-24 07:09:41.940 | WARNING  | lean_dojo.data_extraction.trace:_trace:185 - Failed to build Lean4Repl. You may run into issues when interacting with the repo.
2024-07-24 07:09:41.946 | INFO     | lean_dojo.utils:execute:116 - 
2024-07-24 07:09:41.946 | ERROR    | lean_dojo.utils:execute:117 - fatal: not a git repository (or any of the parent directories): .git

Traceback (most recent call last):
  File "/home/lean4-example/scripts/test.py", line 19, in <module>
    traced_repo = trace(repo)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 248, in trace
    cached_path = get_traced_repo_path(repo, build_deps)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/trace.py", line 213, in get_traced_repo_path
    traced_repo = TracedRepo.from_traced_files(tmp_dir / repo.name, build_deps)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/traced_data.py", line 1094, in from_traced_files
    repo = LeanGitRepo.from_path(root_dir)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/data_extraction/lean.py", line 494, in from_path
    url, commit = get_repo_info(path)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 186, in get_repo_info
    commit_msg, _ = execute(f"git log -n 1", capture_output=True)
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 118, in execute
    raise ex
  File "/root/miniconda3/envs/lean/lib/python3.10/site-packages/lean_dojo/utils.py", line 113, in execute
    res = subprocess.run(cmd, shell=True, capture_output=capture_output, check=True)
  File "/root/miniconda3/envs/lean/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'git log -n 1' returned non-zero exit status 128.

image

Thanks a lot @RexWzh. It works and I can trace the local repo! I've decided to temporarily use this version of the code.
git reset --hard 95714323227f79d7927b4081b1e8fb15ef864d14

@RexWzh
Copy link
Contributor Author

RexWzh commented Jul 25, 2024

@LuoKaiGSW Thanks for assisting with testing the commit.

Please note that this branch is currently under development and might be unstable until all the code has been thoroughly reviewed.

I'll fix that in later commits.

@yangky11
Copy link
Member

Hi @RexWzh, is this PR ready for further review and testing?

@RexWzh
Copy link
Contributor Author

RexWzh commented Jul 29, 2024

Yes, tests are underway and most of the repositories pass successfully, except for mathlib4, which fails at times due to network issues.

@yangky11
Copy link
Member

Got this when running one of the tests:

============================= test session starts ==============================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
rootdir: /private/home/kaiyuy/LeanDojo
configfile: pyproject.toml
plugins: anyio-4.4.0
collected 48 items

tests/data_extraction/test_cache.py F

=================================== FAILURES ===================================
_______________________________ test_repo_cache ________________________________

lean4_example_url = 'https://github.com/yangky11/lean4-example'
remote_example_url = 'https://gitee.com/rexzong/lean4-example'
example_commit_hash = '3f8c5eb303a225cdef609498b8d87262e5ef344b'

    def test_repo_cache(lean4_example_url, remote_example_url, example_commit_hash):
        # Note: The `git.Repo` requires the local repo to be cloned in a directory
        # all cached repos are stored in CACHE_DIR/repos
        prefix = "repos"
        repo_name = "lean4-example"
    
        # test local repo cache
        with working_directory() as tmp_dir:
            repo = Repo.clone_from(lean4_example_url, repo_name)
            repo.git.checkout(example_commit_hash)
            local_dir = tmp_dir / repo_name
            rel_cache_dir = (
                prefix / Path(f"gitpython-{repo_name}-{example_commit_hash}") / repo_name
            )
            cache.store(local_dir, rel_cache_dir)
        # get the cache
        repo_cache_dir = cache.get(rel_cache_dir)
        assert repo_cache_dir is not None
    
        # test remote repo cache
        with working_directory() as tmp_dir:
>           repo = Repo.clone_from(remote_example_url, repo_name)

tests/data_extraction/test_cache.py:29: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/git/repo/base.py:1525: in clone_from
    return cls._clone(
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/git/repo/base.py:1396: in _clone
    finalize_process(proc, stderr=stderr)
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/git/util.py:504: in finalize_process
    proc.wait(**kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <git.cmd.Git.AutoInterrupt object at 0x7fa7e24b3fc0>
stderr = "Cloning into 'lean4-example'...\nfatal: unable to access 'https://gitee.com/rexzong/lean4-example/': The requested URL returned error: 502\n"

    def wait(self, stderr: Union[None, str, bytes] = b"") -> int:
        """Wait for the process and return its status code.
    
        :param stderr:
            Previously read value of stderr, in case stderr is already closed.
    
        :warn:
            May deadlock if output or error pipes are used and not handled
            separately.
    
        :raise git.exc.GitCommandError:
            If the return status is not 0.
        """
        if stderr is None:
            stderr_b = b""
        stderr_b = force_bytes(data=stderr, encoding="utf-8")
        status: Union[int, None]
        if self.proc is not None:
            status = self.proc.wait()
            p_stderr = self.proc.stderr
        else:  # Assume the underlying proc was killed earlier or never existed.
            status = self.status
            p_stderr = None
    
        def read_all_from_possibly_closed_stream(stream: Union[IO[bytes], None]) -> bytes:
            if stream:
                try:
                    return stderr_b + force_bytes(stream.read())
                except (OSError, ValueError):
                    return stderr_b or b""
            else:
                return stderr_b or b""
    
        # END status handling
    
        if status != 0:
            errstr = read_all_from_possibly_closed_stream(p_stderr)
            _logger.debug("AutoInterrupt wait stderr: %r" % (errstr,))
>           raise GitCommandError(remove_password_if_present(self.args), status, errstr)
E           git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
E             cmdline: git clone -v -- https://gitee.com/rexzong/lean4-example lean4-example
E             stderr: 'Cloning into 'lean4-example'...
E           fatal: unable to access 'https://gitee.com/rexzong/lean4-example/': The requested URL returned error: 502
E           '

../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/git/cmd.py:834: GitCommandError
=============================== warnings summary ===============================
../miniconda3/envs/lean-dojo/lib/python3.11/site-packages/jupyter_client/connect.py:22
  /private/home/kaiyuy/miniconda3/envs/lean-dojo/lib/python3.11/site-packages/jupyter_client/connect.py:22: DeprecationWarning: Jupyter is migrating its paths to use standard platformdirs
  given by the platformdirs library.  To remove this warning and
  see the appropriate new directories, set the environment variable
  `JUPYTER_PLATFORM_DIRS=1` and then run `jupyter --paths`.
  The use of platformdirs will be the default in `jupyter_core` v6
    from jupyter_core.paths import jupyter_data_dir, jupyter_runtime_dir, secure_write

src/lean_dojo/data_extraction/trace.py:97
  /private/home/kaiyuy/LeanDojo/src/lean_dojo/data_extraction/trace.py:97: DeprecationWarning: invalid escape sequence '\*'
    """Check if all \*.lean files have been processed to produce \*.ast.json and \*.dep_paths files."""

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/data_extraction/test_cache.py::test_repo_cache - git.exc.GitComm...
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
======================== 1 failed, 2 warnings in 20.04s ========================

@yangky11
Copy link
Member

yangky11 commented Jul 30, 2024

I cannot reproduce the above error reliably (probably a network issue). Running other tests now...

@yangky11 yangky11 merged commit ad23694 into lean-dojo:dev Jul 30, 2024
2 checks passed
@yangky11
Copy link
Member

All other tests were successful. I merged it into dev and will merge into main after some minor cleaning. Thank you so much for this valuable contribution.

@yangky11
Copy link
Member

yangky11 commented Aug 6, 2024

Merged into main and released in v2.1.0. Thanks everyone!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants