Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Uploading .json through API not working #1012

Open
5 tasks done
F1gueron opened this issue Dec 12, 2024 · 8 comments
Open
5 tasks done

Bug: Uploading .json through API not working #1012

F1gueron opened this issue Dec 12, 2024 · 8 comments
Labels
bug Something isn't working needs more info This issue requires more information triage This issue requires triaging

Comments

@F1gueron
Copy link

F1gueron commented Dec 12, 2024

Description:

Hey, im trying to upload .json through API and its giving me 202 as satus code, which means it works, but in the file ingest page, it alway stays as running, or sometime it changes to cancel, i tried uploading the same .json manually and it works fine, so it may be my code. At a first, i tried uploading a file with all the JSON, but it didnt work, so i started to mount every json manually, if the solution lets me upgrade only the zip it would be great

Are you intending to fix this bug?

"no"

Component(s) Affected:

  • API

Steps to Reproduce:

  1. Go to [specific page or endpoint]
  2. Click on [button/element/etc.]
  3. Enter [input/data]
  4. See error at [this point]

Expected Behavior:

I expect to actually upload the data correctly

Actual Behavior:

Having 202 status code, which means, data uploaded correctly, but this happens
image

Screenshots/Code Snippets/Sample Files:

    def create_upload_job(self):
        """Creates a file upload job."""
        response = self._request("POST", "/api/v2/file-upload/start")
        return response

    def upload_file(self, zip_file_path):
        """Uploads all JSON files inside a ZIP to the backend."""
        # Step 1: Extract ZIP file contents
        with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
            extract_path = os.path.splitext(zip_file_path)[0]
            zip_ref.extractall(extract_path)

        # Step 2: Process each JSON file
        for root, _, files in os.walk(extract_path):
            for file_name in files:
                if file_name.endswith('.json'):  # Only process JSON files
                    file_path = os.path.join(root, file_name)
                    with open(file_path, 'rb') as file:
                        file_content = file.read()

                    # Step 3: Create an upload job for each file
                    upload_job = self.create_upload_job()
                    upload_job_data = upload_job.json()
                    file_upload_job_id = upload_job_data['data']['id']

                    # Step 5: Upload the file content to the job
                    response = self._request(
                        method="POST",
                        uri=f"/api/v2/file-upload/{file_upload_job_id}",
                        body=file_content
                    )
                    print("Data loaded. Waiting for data to be processed...")
                    sleep(30)

                    # Log the upload result
                    if response.status_code == 202:
                        print(f"Successfully uploaded {file_name}")
                    else:
                        print(f"Failed to upload {file_name}: {response.status_code} - {response.text}")

Environment Information:

BloodHound: 6.3.0

Collector: [SharpHound version / AzureHound version]

OS: Windows 11

Browser (if UI related): [browser name and version]

Node.js (if UI related: [Node.js version]

Go (if API related): [Go version]

Database (if persistence related): [Neo4j version / PostgreSQL version]

Docker (if using Docker): 4.36.0

Additional Information:

Any additional context or information that might be helpful in understanding and diagnosing the issue.

Potential Solution (optional):

If you have any ideas about what might be causing the issue or how it could be fixed, you can share them here.

Related Issues:

If you've found related issues in the project's issue tracker, mention them here.

Contributor Checklist:

  • I have searched the issue tracker to ensure this bug hasn't been reported before or is not already being addressed.
  • I have provided clear steps to reproduce the issue.
  • I have included relevant environment information details.
  • I have attached necessary supporting documents.
  • I have checked that any JSON files I am attempting to upload to BloodHound are valid.
@F1gueron F1gueron added bug Something isn't working triage This issue requires triaging labels Dec 12, 2024
@F1gueron
Copy link
Author

I found out that i have to end the upload in order to actually do something the upload, but im waiting 50 seconds before doing a get to list the status of files, and some files are getting canceled

@StephenHinck
Copy link
Contributor

A few thoughts:

  1. Is there a reason you're extracting the files from the .zip? BHCE supports .zip ingest and will save you a step.
  2. Can you provide the relevant API logs during this time
  3. Which files are getting canceled, and what are their associated error messages?
  4. You are correct that the file upload must be completed with a POST request to /api/v2/file-upload/$ID/end

@F1gueron
Copy link
Author

F1gueron commented Dec 19, 2024

  1. I also try to upload from .zip, and it never works for the first upload, i need to do 2 tries.
  2. Logs tell me just upload correct or not, but not more details
  3. For me, it always gets cancelled the first upload, it doesnt matter if its a JSON or a ZIP, maybe need to check that for being a possible bug or just my code.

This is my code:

def upload_file(self, zip_file_path):
        file_name = zip_file_path.split("/")[-1]
        with open(zip_file_path, "rb") as file:
            file_content = file.read()
    
        count = 0
        uploaded = False
        i = 0
        while uploaded == False:
            upload_job = self.create_upload_job()
            upload_job_data = upload_job.json()
            file_upload_job_id = upload_job_data['data']['id']

            response_upload = self._requestZip(
                method="POST",
                uri=f"/api/v2/file-upload/{file_upload_job_id}",
                body=file_content
            )
            print(f"{file_name} loaded. Waiting for data to be processed...")

            response_end = self._request(
                method="POST",
                uri=f"/api/v2/file-upload/{file_upload_job_id}/end"
            )
            print(f"Data processing started for {file_name}")

            threshold = 20
            sleep(threshold)
            count += threshold

            while True:
                response_list = self._request(
                    method="GET",
                    uri=f"/api/v2/file-upload"
                )
                JSON_response = response_list.json()
                if JSON_response['data'][i]["status_message"] == "Partially Completed" or JSON_response['data'][0]["status_message"] == "Complete":
                    print(f"Successfully uploaded {file_name}")
                    uploaded = True
                    break  
                if JSON_response['data'][i]['status'] == 3:
                    print("Data processing was canceled. Error on file: " + file_name)
                    i += 1
                    break  
                sleep(threshold)
                count += threshold
                

With this code, it works, but at 2nd, try. I can stick to this because is a program that efficiency is not important as is run once a month. Thanks

@StephenHinck
Copy link
Contributor

If you upload that same file via the UI, does it work fine there, or is it the same behavior?

@F1gueron
Copy link
Author

It works fine, its just by API, everything i upload, works at second try

@StephenHinck
Copy link
Contributor

I asked one of our engineers to double-check this thread, and what you're doing appears to be correct. However, without additional logs or a view of the full code snippet you're running, it will be difficult for us to help you troubleshoot further.

@StephenHinck StephenHinck added the needs more info This issue requires more information label Dec 20, 2024
@F1gueron
Copy link
Author

F1gueron commented Jan 9, 2025

Sorry for the delay, i was on vacation, but this is this the log info on the docker logs:

2025-01-08 10:20:36 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:20:34.267421+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:33706","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"9f487a58-d9b5-439b-ad79-8f1eb9d7d426","request_bytes":135,"response_bytes":0,"status":204,"elapsed":55.816438,"time":"2025-01-08T09:20:36.403486091Z","message":"POST /api/v2/clear-database"}
2025-01-08 10:20:38 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:20:36.416511+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:33714","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"03f3f571-59d2-4b94-beb5-5b5132c1d2b9","request_bytes":0,"response_bytes":242,"status":201,"elapsed":36.096702,"time":"2025-01-08T09:20:38.520172569Z","message":"POST /api/v2/file-upload/start"}
2025-01-08 10:20:40 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:20:38.524101+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:60860","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"1b943a91-b714-461f-8ea5-2ea62fdd8fc9","request_bytes":2357123,"response_bytes":23,"status":202,"elapsed":38.98706,"time":"2025-01-08T09:20:40.616765926Z","message":"POST /api/v2/file-upload/179"}
2025-01-08 10:20:42 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:20:40.620114+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:60862","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"eb90e3e4-fc01-4711-a131-a5882477e903","request_bytes":0,"response_bytes":23,"status":200,"elapsed":13.837052,"time":"2025-01-08T09:20:42.69099955Z","message":"POST /api/v2/file-upload/179/end"}
2025-01-08 10:20:54 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:20:52.695233+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:36074","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"5a4e8bd1-7efb-4c02-92d3-0be006ca2801","request_bytes":0,"response_bytes":286,"status":200,"elapsed":9.800199,"time":"2025-01-08T09:20:54.741412353Z","message":"GET /api/v2/file-upload"}
2025-01-08 10:21:06 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:21:04.743364+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:37594","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"90867637-a157-4347-981e-aa44c7a7a68e","request_bytes":0,"response_bytes":286,"status":200,"elapsed":9.302025,"time":"2025-01-08T09:21:06.792802522Z","message":"GET /api/v2/file-upload"}
2025-01-08 10:21:18 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:21:16.794142+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:46976","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"5cabb1b5-68ed-4e5d-aecb-0f75dd12c9d0","request_bytes":0,"response_bytes":286,"status":200,"elapsed":7.73046,"time":"2025-01-08T09:21:18.837364685Z","message":"GET /api/v2/file-upload"}
2025-01-08 10:21:21 {"level":"info","time":"2025-01-08T09:21:21.219227666Z","message":"Begin Purge Graph Data"}
2025-01-08 10:21:30 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:21:28.840273+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:48348","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"ff14e0a3-9862-4956-992a-acb97d559645","request_bytes":0,"response_bytes":287,"status":200,"elapsed":9.071671,"time":"2025-01-08T09:21:30.906416528Z","message":"GET /api/v2/file-upload"}
2025-01-08 10:21:32 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:21:30.910200+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:48352","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"4cee149c-12fc-482b-8e0d-c5116689c644","request_bytes":0,"response_bytes":241,"status":201,"elapsed":24.422103,"time":"2025-01-08T09:21:32.977062734Z","message":"POST /api/v2/file-upload/start"}
2025-01-08 10:21:35 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:21:32.977797+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:48356","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"4fde5bb1-4e14-4d6a-b6a1-61bd0e2a68c1","request_bytes":2357123,"response_bytes":23,"status":202,"elapsed":34.851713,"time":"2025-01-08T09:21:35.080274894Z","message":"POST /api/v2/file-upload/180"}
2025-01-08 10:21:37 {"level":"info","user_id":"c9ff6420-6e75-4c67-a893-2c17250d27e1","signed_request_date":"2025-01-08T10:21:35.081446+01:00","token_id":"177c1f4a-4c3e-49c8-8a84-73e8e5bb1ca6","remote_addr":"172.18.0.1:48362","proto":"HTTP/1.1","referer":"","user_agent":"bhe-python-sdk 0001","request_id":"c60b012a-2fd5-497e-aec8-e6d2e8c2ba19","request_bytes":0,"response_bytes":23,"status":200,"elapsed":8.922208,"time":"2025-01-08T09:21:37.13901891Z","message":"POST /api/v2/file-upload/180/end"}
2025-01-08 10:21:46 {"level":"info","elapsed":25361.600634,"time":"2025-01-08T09:21:46.57344101Z","message":"Purge Graph Data Completed"}

As you can see, there is 2 uploads try, but i dont find any info of why first try fails.

@F1gueron
Copy link
Author

F1gueron commented Jan 9, 2025

def create_upload_job(self):
        """Creates a file upload job."""
        response = self._request("POST", "/api/v2/file-upload/start")
        return response

def upload_file(self, zip_file_path):
    file_name = zip_file_path.split("/")[-1]
    with open(zip_file_path, "rb") as file:
        file_content = file.read()

    
    uploaded = False
    i = 0
    while uploaded == False:
        count = 0
        upload_job = self.create_upload_job()
        upload_job_data = upload_job.json()
        file_upload_job_id = upload_job_data['data']['id']

        response_upload = self._requestZip(
            method="POST",
            uri=f"/api/v2/file-upload/{file_upload_job_id}",
            body=file_content
        )
        print(f"{file_name} loaded. Waiting for data to be processed...")

        response_end = self._request(
            method="POST",
            uri=f"/api/v2/file-upload/{file_upload_job_id}/end"
        )
        print(f"Data processing started for {file_name}")

        threshold = 10
        sleep(threshold)
        count += threshold

        while True:
            print(f"Waiting for data to be processed... {count}s")
            response_list = self._request(
                method="GET",
                uri=f"/api/v2/file-upload"
            )
            JSON_response = response_list.json()
            if JSON_response['data'][i]["status_message"] == "Partially Completed" or JSON_response['data'][i]["status_message"] == "Complete" or JSON_response['data'][i]["status_message"] == "Analyzing":
                print(f"Successfully uploaded {file_name}")
                uploaded = True
                break
            if JSON_response['data'][i-1]["status_message"] == "Partially Completed" or JSON_response['data'][i-1]["status_message"] == "Complete" or JSON_response['data'][i-1]["status_message"] == "Analyzing":
                print(f"Successfully uploaded {file_name}")
                uploaded = True
                break  
            if JSON_response['data'][i]['status'] == 3:
                print("Data processing was canceled. Error on file: " + file_name)
                i += 1
                break  
            sleep(threshold)
            count += threshold
            if count > 300:
                print("Data processed timeout on file: " + file_name)
                break
                
    def _request(self, method: str, uri: str, body: Optional[bytes] = None) -> requests.Response:
        # Digester is initialized with HMAC-SHA-256 using the token key as the HMAC digest key.
        digester = hmac.new(self._credentials.token_key.encode(), None, hashlib.sha256)

        # OperationKey is the first HMAC digest link in the signature chain. This prevents replay attacks that seek to
        # modify the request method or URI. It is composed of concatenating the request method and the request URI with
        # no delimiter and computing the HMAC digest using the token key as the digest secret.
        #
        # Example: GET /api/v1/test/resource HTTP/1.1
        # Signature Component: GET/api/v1/test/resource
        digester.update(f"{method}{uri}".encode())

        # Update the digester for further chaining
        digester = hmac.new(digester.digest(), None, hashlib.sha256)

        # DateKey is the next HMAC digest link in the signature chain. This encodes the RFC3339 formatted datetime
        # value as part of the signature to the hour to prevent replay attacks that are older than max two hours. This
        # value is added to the signature chain by cutting off all values from the RFC3339 formatted datetime from the
        # hours value forward:
        #
        # Example: 2020-12-01T23:59:60Z
        # Signature Component: 2020-12-01T23
        datetime_formatted = datetime.datetime.now().astimezone().isoformat("T")
        digester.update(datetime_formatted[:13].encode())

        # Update the digester for further chaining
        digester = hmac.new(digester.digest(), None, hashlib.sha256)

        # Body signing is the last HMAC digest link in the signature chain. This encodes the request body as part of
        # the signature to prevent replay attacks that seek to modify the payload of a signed request. In the case
        # where there is no body content the HMAC digest is computed anyway, simply with no values written to the
        # digester.
        if body is not None:
            digester.update(body)

        # Perform the request with the signed and expected headers
        return requests.request(
            method=method,
            url=self._format_url(uri),
            headers={
                "User-Agent": "bhe-python-sdk 0001",
                "Authorization": f"bhesignature {self._credentials.token_id}",
                "RequestDate": datetime_formatted,
                "Signature": base64.b64encode(digester.digest()),
                "Content-Type": "application/json",
            },
            data=body,
        )
    
    def _requestZip(self, method: str, uri: str, body: Optional[bytes] = None) -> requests.Response:
        # Digester is initialized with HMAC-SHA-256 using the token key as the HMAC digest key.
        digester = hmac.new(self._credentials.token_key.encode(), None, hashlib.sha256)

        # OperationKey is the first HMAC digest link in the signature chain. This prevents replay attacks that seek to
        # modify the request method or URI. It is composed of concatenating the request method and the request URI with
        # no delimiter and computing the HMAC digest using the token key as the digest secret.
        #
        # Example: GET /api/v1/test/resource HTTP/1.1
        # Signature Component: GET/api/v1/test/resource
        digester.update(f"{method}{uri}".encode())

        # Update the digester for further chaining
        digester = hmac.new(digester.digest(), None, hashlib.sha256)

        # DateKey is the next HMAC digest link in the signature chain. This encodes the RFC3339 formatted datetime
        # value as part of the signature to the hour to prevent replay attacks that are older than max two hours. This
        # value is added to the signature chain by cutting off all values from the RFC3339 formatted datetime from the
        # hours value forward:
        #
        # Example: 2020-12-01T23:59:60Z
        # Signature Component: 2020-12-01T23
        datetime_formatted = datetime.datetime.now().astimezone().isoformat("T")
        digester.update(datetime_formatted[:13].encode())

        # Update the digester for further chaining
        digester = hmac.new(digester.digest(), None, hashlib.sha256)

        # Body signing is the last HMAC digest link in the signature chain. This encodes the request body as part of
        # the signature to prevent replay attacks that seek to modify the payload of a signed request. In the case
        # where there is no body content the HMAC digest is computed anyway, simply with no values written to the
        # digester.
        if body is not None:
            digester.update(body)

        # Perform the request with the signed and expected headers
        return requests.request(
            method=method,
            url=self._format_url(uri),
            headers={
                "User-Agent": "bhe-python-sdk 0001",
                "Authorization": f"bhesignature {self._credentials.token_id}",
                "RequestDate": datetime_formatted,
                "Signature": base64.b64encode(digester.digest()),
                "Content-Type": "application/zip",
            },
            data=body,
        )

Also, this is the code im running

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs more info This issue requires more information triage This issue requires triaging
Projects
None yet
Development

No branches or pull requests

2 participants