Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTCONDOR-1323 add job removal debugging docs #615

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions docs/v23/troubleshooting/common-issues.md
Original file line number Diff line number Diff line change
Expand Up @@ -550,6 +550,21 @@ This means that the `condor_job_router_info` (note this is not the CE version),
2. You have installed HTCondor in a non-standard location that is not in your `PATH`.
3. The `condor_job_router_info` tool itself wasn't available until Condor-8.2.3-1.1 (available in osg-upcoming).

### Troubleshooting removed jobs

As there are two copies of a job in the CE, understanding the root cause of a removed job requires a bit of
sleuthing. Given a specific job ID in the CE logs, first find the job ad in the history with the `condor_ce_history`
tool, and check the value of the `GridJobID` attribute:

``` console
user@host $ condor_ce_history <JOB_ID> -af GridJobId
```

If the `GridJobId` is *undefined*, the client gridmanager removed the job, and you need to contact the
administrator of that client gridmanager to determine why.

If `GridJobID` is not undefined, and is set to some value, then the CE gridmanager did the removal.

Getting Help
------------

Expand Down
Loading