Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird agent to vault issue. #113

Open
Christiaan-Mathu opened this issue Aug 10, 2023 · 12 comments
Open

Weird agent to vault issue. #113

Christiaan-Mathu opened this issue Aug 10, 2023 · 12 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@Christiaan-Mathu
Copy link

Hi all, hope you are well.

I have set up our own agents, that looks like its running fine.

We also host our own vault on some of our servers.

I managed to get the agents to upload to the vault and everything looks fine but then after a while when there are a lot of events/videos in a short time the agents starts backing up and then after awhile they completely fail to upload new videos and the logs shows the following

Agent Log:
{"level":"info","msg":"UploadKerberosVault: Uploading to Kerberos Vault (https://ourVault.com/api)","time":"2023-08-10T08:20:06+02:00"} {"level":"info","msg":"UploadKerberosVault: Upload started for 1691532014_6-967003_Cam 1_200-200-400-400_276_769.mp4","time":"2023-08-10T08:20:06+02:00"} {"level":"info","msg":"UploadKerberosVault: Upload Failed, file doesn't exists anymore.","time":"2023-08-10T08:20:06+02:00"} {"level":"error","msg":"HandleUpload: UploadKerberosVault: Upload Failed, file doesn't exists anymore.","time":"2023-08-10T08:20:06+02:00"}
Please let me know if there is some more info I can give to try and narrow down what it happening 😄

@cedricve
Copy link
Member

Thank you for reporting @Christiaan-Mathu. Which Kerberos Agent version are you using?

@Christiaan-Mathu
Copy link
Author

No problem @cedricve, I have tried multiple versions and the problem still persists, am currently using the latest dev docker image (kerberos/agent-dev:latest)

image

@cedricve
Copy link
Member

Can you run this version 384448d, as this resolved an issue with uploading and cleanup.

@Christiaan-Mathu
Copy link
Author

Will do I will let you know if it fixes the problem, thanks in advance 😄

@Christiaan-Mathu
Copy link
Author

Oky the majority of the recording are now uploading but there are still some that struggle and saw in the logs that it's getting 500 response from the vault so I checked the vault logs and saw the following error:

2023/08/10 08:04:20 [Recovery] 2023/08/10 - 08:04:20 panic recovered:
[GIN] 2023/08/10 - 08:04:20 | 500 |    2.327857ms |    192.168.10.1 | POST     "/api/storage"
[GIN] 2023/08/10 - 08:04:20 | 500 |    2.360449ms |    192.168.10.1 | POST     "/api/storage"
runtime error: invalid memory address or nil pointer dereference
/usr/local/go/src/runtime/panic.go:260 (0x451515)
/usr/local/go/src/runtime/signal_unix.go:835 (0x4514e5)
/go/pkg/mod/github.com/!eyevinn/[email protected]/mp4/ftyp.go:85 (0xe98bb0)
/go/src/gitlab.com/kerberos-io/vault/api/mp4/fragmented.go:20 (0xe98ba8)
/go/src/gitlab.com/kerberos-io/vault/api/storage/main.go:279 (0x108aee7)
/go/src/gitlab.com/kerberos-io/vault/api/main.go:235 (0x10cda54)
/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174 (0xb04414)
/go/pkg/mod/github.com/zsais/[email protected]/middleware.go:364 (0xb043f5)
/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174 (0x9434c1)
/go/pkg/mod/github.com/gin-gonic/[email protected]/recovery.go:102 (0x9434ac)
/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174 (0x9425c6)
/go/pkg/mod/github.com/gin-gonic/[email protected]/logger.go:240 (0x9425a9)
/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174 (0x9434c1)
/go/pkg/mod/github.com/gin-gonic/[email protected]/recovery.go:102 (0x9434ac)
/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174 (0x9425c6)
/go/pkg/mod/github.com/gin-gonic/[email protected]/logger.go:240 (0x9425a9)
/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174 (0x941650)
/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620 (0x9412b8)
/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576 (0x940dfc)
/usr/local/go/src/net/http/server.go:2947 (0x6eb6eb)
/usr/local/go/src/net/http/server.go:1991 (0x6e6906)
/usr/local/go/src/runtime/asm_amd64.s:1594 (0x46eda0)

@cedricve
Copy link
Member

Can you disable mp4 fragmentation? Not sure which cameras your are using but it failed to read. out the fragmented mp4 metadata.

Screenshot 2023-08-10 at 10 40 09

@Christiaan-Mathu
Copy link
Author

Hi @cedricve, I have not enabled it its still disabled on the agents?
image

@Christiaan-Mathu
Copy link
Author

Here is a sample of my agent with the secrets removed:

kerberos-agent1:
    image: "kerberos/agent:384448d"
    restart: always
    ports:
      - "8081:80"
    environment:
      - AGENT_NAME=Cam 1
      - AGENT_USERNAME=admin
      - AGENT_PASSWORD=pass
      - AGENT_TIMEZONE=Africa/Johannesburg
      - AGENT_CAPTURE_IPCAMERA_RTSP=rtsp://192.168.10.2:554/unicast/c1/s1/live
      - AGENT_CAPTURE_IPCAMERA_SUB_RTSP=rtsp://192.168.10.2:554/unicast/c1/s1/live
      - AGENT_CAPTURE_PRERECORDING=2
      - AGENT_CAPTURE_POSTRECORDING=2
      - AGENT_CAPTURE_MAXLENGTH=5
      - AGENT_CAPTURE_PIXEL_CHANGE=150
      - AGENT_CLOUD=kstorage
      - AGENT_KERBEROSVAULT_URI=https://myvault.com/api
      - AGENT_KERBEROSVAULT_ACCESS_KEY=key
      - AGENT_KERBEROSVAULT_SECRET_KEY=secret
      - AGENT_KERBEROSVAULT_PROVIDER=MyLocalMinio
      - AGENT_KERBEROSVAULT_DIRECTORY=office
      - AGENT_REGION_POLYGON=0,50;330,50;330,0;1280,0;1280,720;0,720

@Christiaan-Mathu
Copy link
Author

Hi @cedricve have you had a chance to investigate this problem?

It's quite weird because some of the agents work very well but some just don't work...

I am still getting this error when the failing agents try to upload

@Christiaan-Mathu
Copy link
Author

Christiaan-Mathu commented Aug 18, 2023

Hey @cedricve, I'll try to break down what I think is happening and please correct me if I'm mistaken.

Here's what I think is going on:

Connection Loss Issue: When the link between the agent and the vault breaks, video backups are queued up and can't be sent for uploading.

Upload Sequence After Reconnection: Once the connection is reestablished, the system starts uploading videos from the backup. It processes the videos in the order they were created, starting with the oldest video. This sequence follows the upload file naming convention.

Auto-Clean Interference: However, there's a problem. The system's auto-clean function sometimes deletes the video files that are being uploaded. This causes the upload to fail.

Retrying After Failure: After a failed upload, the system waits for quite a while before attempting to upload the video again.

A possible solution to this issue could involve modifying a specific loop in the system's code. The current loop structure reads all files in the directory for uploading, which can lead to conflicts with the auto-clean process. Here's a suggested alteration:

Original Loop:

While(True){
    files = readDir()

    if (files != null) {
        for file in files{
            doUpload(file) // The file object might be deleted due to auto cleaning
        }
    } else {
        wait(delay)
    }
}

Proposed Solution:

While(True){
    file = getOneFileInDir()
    
    if (file != null) {
        doUpload()
    } else {
        wait(delay)
    }
}

In this proposed solution, the system would focus on one file at a time, mitigating the conflict with auto-cleaning. This could help ensure successful video uploads without the interference caused by the auto-clean function.

Please let me know if this explanation aligns with your understanding or if there are any corrections needed.

@cedricve
Copy link
Member

Thanks @Christiaan-Mathu for the explanation. So we did fix something in the latest release: https://github.com/kerberos-io/agent/blob/master/machinery/src/cloud/Cloud.go#L114-L141

Going into your previous comment, it actually tells that the video was recorded and recognised as a fragmented mp4.

2023/08/10 08:04:20 [Recovery] 2023/08/10 - 08:04:20 panic recovered:
[GIN] 2023/08/10 - 08:04:20 | 500 |    2.327857ms |    192.168.10.1 | POST     "/api/storage"
[GIN] 2023/08/10 - 08:04:20 | 500 |    2.360449ms |    192.168.10.1 | POST     "/api/storage"
runtime error: invalid memory address or nil pointer dereference
/usr/local/go/src/runtime/panic.go:260 (0x451515)
/usr/local/go/src/runtime/signal_unix.go:835 (0x4514e5)
/go/pkg/mod/github.com/!eyevinn/[email protected]/mp4/ftyp.go:85 (0xe98bb0)
/go/src/gitlab.com/kerberos-io/vault/api/mp4/fragmented.go:20 (0xe98ba8)

@cedricve
Copy link
Member

cedricve commented Aug 18, 2023

However I do get your point. At the moment the upload failed due to network connectivity, we will "slow down" for x seconds (https://github.com/kerberos-io/agent/blob/master/machinery/src/cloud/Cloud.go#L136-L139) and try to upload the next one.

On the next run we will check if the video is still available (https://github.com/kerberos-io/agent/blob/master/machinery/src/cloud/KerberosHub.go#L40-L48) and if not it will be marked as to be removedhttps://github.com/kerberos-io/agent/blob/master/machinery/src/cloud/Cloud.go#L131

@cedricve cedricve self-assigned this Sep 7, 2023
@cedricve cedricve added the help wanted Extra attention is needed label Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants