Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various errors when running a spark cluster driver with mirrord #2929

Open
t4lz opened this issue Nov 22, 2024 · 1 comment
Open

Various errors when running a spark cluster driver with mirrord #2929

t4lz opened this issue Nov 22, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@t4lz
Copy link
Member

t4lz commented Nov 22, 2024

User runs a spark cluster driver with mirrord.
The program downloads heavy jar files from s3 and then uploads them to the workers. This process seems to be slow with mirrord.
At some point the ping-pong between the intproxy and the agent fails.

Additionally, there is this error log from mirrord:

2024-11-21T12:22:31.234085Z ERROR ThreadId(02) mirrord_layer::socket::ops: connect -> Failed call to libc::connect with ConnectResult {
    result: -1,
    error: Some(
        Errno {
            code: 22,
            description: Some(
                "Invalid argument",
            ),
        },
    ),
}
2024-11-21T12:22:31.234203Z ERROR ThreadId(02) mirrord_layer::error: Error occured in Layer >> IO(Os { code: 22, kind: InvalidInput, message: "Invalid argument" })

Sometimes the run seems to be successful (enough?) despite the errors.

@t4lz t4lz added the bug Something isn't working label Nov 22, 2024
Copy link

linear bot commented Nov 22, 2024

@t4lz t4lz removed their assignment Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant