Bug: Uploading .json through API not working #1012

F1gueron · 2024-12-12T10:45:40Z

Description:

Hey, im trying to upload .json through API and its giving me 202 as satus code, which means it works, but in the file ingest page, it alway stays as running, or sometime it changes to cancel, i tried uploading the same .json manually and it works fine, so it may be my code. At a first, i tried uploading a file with all the JSON, but it didnt work, so i started to mount every json manually, if the solution lets me upgrade only the zip it would be great

Are you intending to fix this bug?

"no"

Component(s) Affected:

API

Steps to Reproduce:

Go to [specific page or endpoint]
Click on [button/element/etc.]
Enter [input/data]
See error at [this point]

Expected Behavior:

I expect to actually upload the data correctly

Actual Behavior:

Having 202 status code, which means, data uploaded correctly, but this happens

Screenshots/Code Snippets/Sample Files:

    def create_upload_job(self):
        """Creates a file upload job."""
        response = self._request("POST", "/api/v2/file-upload/start")
        return response

    def upload_file(self, zip_file_path):
        """Uploads all JSON files inside a ZIP to the backend."""
        # Step 1: Extract ZIP file contents
        with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
            extract_path = os.path.splitext(zip_file_path)[0]
            zip_ref.extractall(extract_path)

        # Step 2: Process each JSON file
        for root, _, files in os.walk(extract_path):
            for file_name in files:
                if file_name.endswith('.json'):  # Only process JSON files
                    file_path = os.path.join(root, file_name)
                    with open(file_path, 'rb') as file:
                        file_content = file.read()

                    # Step 3: Create an upload job for each file
                    upload_job = self.create_upload_job()
                    upload_job_data = upload_job.json()
                    file_upload_job_id = upload_job_data['data']['id']

                    # Step 5: Upload the file content to the job
                    response = self._request(
                        method="POST",
                        uri=f"/api/v2/file-upload/{file_upload_job_id}",
                        body=file_content
                    )
                    print("Data loaded. Waiting for data to be processed...")
                    sleep(30)

                    # Log the upload result
                    if response.status_code == 202:
                        print(f"Successfully uploaded {file_name}")
                    else:
                        print(f"Failed to upload {file_name}: {response.status_code} - {response.text}")

Environment Information:

BloodHound: 6.3.0

Collector: [SharpHound version / AzureHound version]

OS: Windows 11

Browser (if UI related): [browser name and version]

Node.js (if UI related: [Node.js version]

Go (if API related): [Go version]

Database (if persistence related): [Neo4j version / PostgreSQL version]

Docker (if using Docker): 4.36.0

Additional Information:

Any additional context or information that might be helpful in understanding and diagnosing the issue.

Potential Solution (optional):

If you have any ideas about what might be causing the issue or how it could be fixed, you can share them here.

Related Issues:

If you've found related issues in the project's issue tracker, mention them here.

Contributor Checklist:

I have searched the issue tracker to ensure this bug hasn't been reported before or is not already being addressed.
I have provided clear steps to reproduce the issue.
I have included relevant environment information details.
I have attached necessary supporting documents.
I have checked that any JSON files I am attempting to upload to BloodHound are valid.

F1gueron · 2024-12-18T10:13:25Z

I found out that i have to end the upload in order to actually do something the upload, but im waiting 50 seconds before doing a get to list the status of files, and some files are getting canceled

StephenHinck · 2024-12-18T20:20:11Z

A few thoughts:

Is there a reason you're extracting the files from the .zip? BHCE supports .zip ingest and will save you a step.
Can you provide the relevant API logs during this time
Which files are getting canceled, and what are their associated error messages?
You are correct that the file upload must be completed with a POST request to /api/v2/file-upload/$ID/end

F1gueron · 2024-12-19T09:45:44Z

I also try to upload from .zip, and it never works for the first upload, i need to do 2 tries.
Logs tell me just upload correct or not, but not more details
For me, it always gets cancelled the first upload, it doesnt matter if its a JSON or a ZIP, maybe need to check that for being a possible bug or just my code.

This is my code:

def upload_file(self, zip_file_path):
        file_name = zip_file_path.split("/")[-1]
        with open(zip_file_path, "rb") as file:
            file_content = file.read()
    
        count = 0
        uploaded = False
        i = 0
        while uploaded == False:
            upload_job = self.create_upload_job()
            upload_job_data = upload_job.json()
            file_upload_job_id = upload_job_data['data']['id']

            response_upload = self._requestZip(
                method="POST",
                uri=f"/api/v2/file-upload/{file_upload_job_id}",
                body=file_content
            )
            print(f"{file_name} loaded. Waiting for data to be processed...")

            response_end = self._request(
                method="POST",
                uri=f"/api/v2/file-upload/{file_upload_job_id}/end"
            )
            print(f"Data processing started for {file_name}")

            threshold = 20
            sleep(threshold)
            count += threshold

            while True:
                response_list = self._request(
                    method="GET",
                    uri=f"/api/v2/file-upload"
                )
                JSON_response = response_list.json()
                if JSON_response['data'][i]["status_message"] == "Partially Completed" or JSON_response['data'][0]["status_message"] == "Complete":
                    print(f"Successfully uploaded {file_name}")
                    uploaded = True
                    break  
                if JSON_response['data'][i]['status'] == 3:
                    print("Data processing was canceled. Error on file: " + file_name)
                    i += 1
                    break  
                sleep(threshold)
                count += threshold

With this code, it works, but at 2nd, try. I can stick to this because is a program that efficiency is not important as is run once a month. Thanks

StephenHinck · 2024-12-19T15:49:28Z

If you upload that same file via the UI, does it work fine there, or is it the same behavior?

F1gueron · 2024-12-19T17:37:15Z

It works fine, its just by API, everything i upload, works at second try

StephenHinck · 2024-12-20T16:52:32Z

I asked one of our engineers to double-check this thread, and what you're doing appears to be correct. However, without additional logs or a view of the full code snippet you're running, it will be difficult for us to help you troubleshoot further.

F1gueron · 2025-01-09T09:18:17Z

Sorry for the delay, i was on vacation, but this is this the log info on the docker logs:

2025-01-08 10:20:36 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:20:34.267421+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:33706","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"9f487a58-d9b5-439b-ad79-8f1eb9d7d426","request_bytes":135,"response_bytes":0,"status":204,"elapsed":55.816438,"time":"2025-01-08T09:20:36.403486091Z","message":"POST /api/v2/clear-database"}
2025-01-08 10:20:38 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:20:36.416511+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:33714","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"03f3f571-59d2-4b94-beb5-5b5132c1d2b9","request_bytes":0,"response_bytes":242,"status":201,"elapsed":36.096702,"time":"2025-01-08T09:20:38.520172569Z","message":"POST /api/v2/file-upload/start"}
2025-01-08 10:20:40 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:20:38.524101+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:60860","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"1b943a91-b714-461f-8ea5-2ea62fdd8fc9","request_bytes":2357123,"response_bytes":23,"status":202,"elapsed":38.98706,"time":"2025-01-08T09:20:40.616765926Z","message":"POST /api/v2/file-upload/179"}
2025-01-08 10:20:42 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:20:40.620114+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:60862","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"eb90e3e4-fc01-4711-a131-a5882477e903","request_bytes":0,"response_bytes":23,"status":200,"elapsed":13.837052,"time":"2025-01-08T09:20:42.69099955Z","message":"POST /api/v2/file-upload/179/end"}
2025-01-08 10:20:54 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:20:52.695233+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:36074","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"5a4e8bd1-7efb-4c02-92d3-0be006ca2801","request_bytes":0,"response_bytes":286,"status":200,"elapsed":9.800199,"time":"2025-01-08T09:20:54.741412353Z","message":"GET /api/v2/file-upload"}
2025-01-08 10:21:06 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:21:04.743364+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:37594","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"90867637-a157-4347-981e-aa44c7a7a68e","request_bytes":0,"response_bytes":286,"status":200,"elapsed":9.302025,"time":"2025-01-08T09:21:06.792802522Z","message":"GET /api/v2/file-upload"}
2025-01-08 10:21:18 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:21:16.794142+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:46976","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"5cabb1b5-68ed-4e5d-aecb-0f75dd12c9d0","request_bytes":0,"response_bytes":286,"status":200,"elapsed":7.73046,"time":"2025-01-08T09:21:18.837364685Z","message":"GET /api/v2/file-upload"}
2025-01-08 10:21:21 {"level":"info","time":"2025-01-08T09:21:21.219227666Z","message":"Begin Purge Graph Data"}
2025-01-08 10:21:30 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:21:28.840273+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:48348","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"ff14e0a3-9862-4956-992a-acb97d559645","request_bytes":0,"response_bytes":287,"status":200,"elapsed":9.071671,"time":"2025-01-08T09:21:30.906416528Z","message":"GET /api/v2/file-upload"}
2025-01-08 10:21:32 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:21:30.910200+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:48352","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"4cee149c-12fc-482b-8e0d-c5116689c644","request_bytes":0,"response_bytes":241,"status":201,"elapsed":24.422103,"time":"2025-01-08T09:21:32.977062734Z","message":"POST /api/v2/file-upload/start"}
2025-01-08 10:21:35 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:21:32.977797+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:48356","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"4fde5bb1-4e14-4d6a-b6a1-61bd0e2a68c1","request_bytes":2357123,"response_bytes":23,"status":202,"elapsed":34.851713,"time":"2025-01-08T09:21:35.080274894Z","message":"POST /api/v2/file-upload/180"}
2025-01-08 10:21:37 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:21:35.081446+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:48362","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"c60b012a-2fd5-497e-aec8-e6d2e8c2ba19","request_bytes":0,"response_bytes":23,"status":200,"elapsed":8.922208,"time":"2025-01-08T09:21:37.13901891Z","message":"POST /api/v2/file-upload/180/end"}
2025-01-08 10:21:46 {"level":"info","elapsed":25361.600634,"time":"2025-01-08T09:21:46.57344101Z","message":"Purge Graph Data Completed"}

As you can see, there is 2 uploads try, but i dont find any info of why first try fails.

F1gueron · 2025-01-09T09:27:30Z

def create_upload_job(self):
        """Creates a file upload job."""
        response = self._request("POST", "/api/v2/file-upload/start")
        return response

def upload_file(self, zip_file_path):
    file_name = zip_file_path.split("/")[-1]
    with open(zip_file_path, "rb") as file:
        file_content = file.read()

    
    uploaded = False
    i = 0
    while uploaded == False:
        count = 0
        upload_job = self.create_upload_job()
        upload_job_data = upload_job.json()
        file_upload_job_id = upload_job_data['data']['id']

        response_upload = self._requestZip(
            method="POST",
            uri=f"/api/v2/file-upload/{file_upload_job_id}",
            body=file_content
        )
        print(f"{file_name} loaded. Waiting for data to be processed...")

        response_end = self._request(
            method="POST",
            uri=f"/api/v2/file-upload/{file_upload_job_id}/end"
        )
        print(f"Data processing started for {file_name}")

        threshold = 10
        sleep(threshold)
        count += threshold

        while True:
            print(f"Waiting for data to be processed... {count}s")
            response_list = self._request(
                method="GET",
                uri=f"/api/v2/file-upload"
            )
            JSON_response = response_list.json()
            if JSON_response['data'][i]["status_message"] == "Partially Completed" or JSON_response['data'][i]["status_message"] == "Complete" or JSON_response['data'][i]["status_message"] == "Analyzing":
                print(f"Successfully uploaded {file_name}")
                uploaded = True
                break
            if JSON_response['data'][i-1]["status_message"] == "Partially Completed" or JSON_response['data'][i-1]["status_message"] == "Complete" or JSON_response['data'][i-1]["status_message"] == "Analyzing":
                print(f"Successfully uploaded {file_name}")
                uploaded = True
                break  
            if JSON_response['data'][i]['status'] == 3:
                print("Data processing was canceled. Error on file: " + file_name)
                i += 1
                break  
            sleep(threshold)
            count += threshold
            if count > 300:
                print("Data processed timeout on file: " + file_name)
                break
                
    def _request(self, method: str, uri: str, body: Optional[bytes] = None) -> requests.Response:
        # Digester is initialized with HMAC-SHA-256 using the token key as the HMAC digest key.
        digester = hmac.new(self._credentials.token_key.encode(), None, hashlib.sha256)

        # OperationKey is the first HMAC digest link in the signature chain. This prevents replay attacks that seek to
        # modify the request method or URI. It is composed of concatenating the request method and the request URI with
        # no delimiter and computing the HMAC digest using the token key as the digest secret.
        #
        # Example: GET /api/v1/test/resource HTTP/1.1
        # Signature Component: GET/api/v1/test/resource
        digester.update(f"{method}{uri}".encode())

        # Update the digester for further chaining
        digester = hmac.new(digester.digest(), None, hashlib.sha256)

        # DateKey is the next HMAC digest link in the signature chain. This encodes the RFC3339 formatted datetime
        # value as part of the signature to the hour to prevent replay attacks that are older than max two hours. This
        # value is added to the signature chain by cutting off all values from the RFC3339 formatted datetime from the
        # hours value forward:
        #
        # Example: 2020-12-01T23:59:60Z
        # Signature Component: 2020-12-01T23
        datetime_formatted = datetime.datetime.now().astimezone().isoformat("T")
        digester.update(datetime_formatted[:13].encode())

        # Update the digester for further chaining
        digester = hmac.new(digester.digest(), None, hashlib.sha256)

        # Body signing is the last HMAC digest link in the signature chain. This encodes the request body as part of
        # the signature to prevent replay attacks that seek to modify the payload of a signed request. In the case
        # where there is no body content the HMAC digest is computed anyway, simply with no values written to the
        # digester.
        if body is not None:
            digester.update(body)

        # Perform the request with the signed and expected headers
        return requests.request(
            method=method,
            url=self._format_url(uri),
            headers={
                "User-Agent": "bhe-python-sdk 0001",
                "Authorization": f"bhesignature {self._credentials.token_id}",
                "RequestDate": datetime_formatted,
                "Signature": base64.b64encode(digester.digest()),
                "Content-Type": "application/json",
            },
            data=body,
        )
    
    def _requestZip(self, method: str, uri: str, body: Optional[bytes] = None) -> requests.Response:
        # Digester is initialized with HMAC-SHA-256 using the token key as the HMAC digest key.
        digester = hmac.new(self._credentials.token_key.encode(), None, hashlib.sha256)

        # OperationKey is the first HMAC digest link in the signature chain. This prevents replay attacks that seek to
        # modify the request method or URI. It is composed of concatenating the request method and the request URI with
        # no delimiter and computing the HMAC digest using the token key as the digest secret.
        #
        # Example: GET /api/v1/test/resource HTTP/1.1
        # Signature Component: GET/api/v1/test/resource
        digester.update(f"{method}{uri}".encode())

        # Update the digester for further chaining
        digester = hmac.new(digester.digest(), None, hashlib.sha256)

        # DateKey is the next HMAC digest link in the signature chain. This encodes the RFC3339 formatted datetime
        # value as part of the signature to the hour to prevent replay attacks that are older than max two hours. This
        # value is added to the signature chain by cutting off all values from the RFC3339 formatted datetime from the
        # hours value forward:
        #
        # Example: 2020-12-01T23:59:60Z
        # Signature Component: 2020-12-01T23
        datetime_formatted = datetime.datetime.now().astimezone().isoformat("T")
        digester.update(datetime_formatted[:13].encode())

        # Update the digester for further chaining
        digester = hmac.new(digester.digest(), None, hashlib.sha256)

        # Body signing is the last HMAC digest link in the signature chain. This encodes the request body as part of
        # the signature to prevent replay attacks that seek to modify the payload of a signed request. In the case
        # where there is no body content the HMAC digest is computed anyway, simply with no values written to the
        # digester.
        if body is not None:
            digester.update(body)

        # Perform the request with the signed and expected headers
        return requests.request(
            method=method,
            url=self._format_url(uri),
            headers={
                "User-Agent": "bhe-python-sdk 0001",
                "Authorization": f"bhesignature {self._credentials.token_id}",
                "RequestDate": datetime_formatted,
                "Signature": base64.b64encode(digester.digest()),
                "Content-Type": "application/zip",
            },
            data=body,
        )

Also, this is the code im running

F1gueron added bug Something isn't working triage This issue requires triaging labels Dec 12, 2024

StephenHinck added the needs more info This issue requires more information label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Uploading .json through API not working #1012

Bug: Uploading .json through API not working #1012

F1gueron commented Dec 12, 2024 •

edited

Loading

F1gueron commented Dec 18, 2024

StephenHinck commented Dec 18, 2024

F1gueron commented Dec 19, 2024 •

edited

Loading

StephenHinck commented Dec 19, 2024

F1gueron commented Dec 19, 2024

StephenHinck commented Dec 20, 2024

F1gueron commented Jan 9, 2025

F1gueron commented Jan 9, 2025

Bug: Uploading .json through API not working #1012

Bug: Uploading .json through API not working #1012

Comments

F1gueron commented Dec 12, 2024 • edited Loading

Description:

Are you intending to fix this bug?

Component(s) Affected:

Steps to Reproduce:

Expected Behavior:

Actual Behavior:

Screenshots/Code Snippets/Sample Files:

Environment Information:

Additional Information:

Potential Solution (optional):

Related Issues:

Contributor Checklist:

F1gueron commented Dec 18, 2024

StephenHinck commented Dec 18, 2024

F1gueron commented Dec 19, 2024 • edited Loading

StephenHinck commented Dec 19, 2024

F1gueron commented Dec 19, 2024

StephenHinck commented Dec 20, 2024

F1gueron commented Jan 9, 2025

F1gueron commented Jan 9, 2025

F1gueron commented Dec 12, 2024 •

edited

Loading

F1gueron commented Dec 19, 2024 •

edited

Loading