Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get_filings method fails 403 Forbidden error when fetching N-PX filings. #208

Closed
jacob187 opened this issue Jan 24, 2025 · 6 comments
Closed

Comments

@jacob187
Copy link

Description

When attempting to download N-PX filings for 2023, the script encounters two sequential failures:

  1. Initial attempt: httpx.ConnectionTimeout exception
  2. Subsequent attempt: HTTP 403 Forbidden error

Reproduction Steps:

I ran the following python script:

from edgar import get_filings, set_identity

def download_npx_filings_year(year: int, path: str) -> None:
    filings = get_filings(year, form="N-PX")
    for filing in filings:
        filing.attachments.download(path)

if __name__ == "__main__":
    set_identity("Jacob Cohen [email protected]")
    path = "/path/to/save/files"
    download_npx_filings_year(2023, path)

Error Details

httpx.HTTPStatusError: Client error '403 Forbidden' for url 'https://www.sec.gov/Archives/edgar/full-index/2023/QTR1/form.gz.'

Environment

  • Python version: 3.11.9
  • Edgartools version: 3.8.4
  • Operating System: macOS Sequoia 15.2

Context and Questions

  • The set_identity() method appears to be properly configured.
  • The error occurs despite following rate limit.
  • Are there any known issues with N-PX downloads?
  • Are there any recommend workarounds or solutions?
@WaloupGarou
Copy link

Hey,
It seems that set_identiy() method wait from you to give as argument an http "user-agent" header.
I had the same issue, but now the following command works
# Tell the SEC who you are set_identity("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:134.0) Gecko/20100101 Firefox/134.0")
You can try it as a workaround.

@dgunning
Copy link
Owner

Jacob, is the problem still occurring?

The initial timeout attempt followed by the HTTP 403 error seems like a transient backend server failure on SEC Edgar.

I don't think it has to do with identity which you say is set properly

@jacob187
Copy link
Author

I am no longer getting 403 errors, but I'm now encountering different HTTP errors related to the httpcore and httpx packages:

  • The initial error is httpcore.ReadTimeout
  • The second error is httpx.ReadTimeout

The error trace suggests it is happening in the network layer, in /python3.11/site-packages/httpcore/_async/http11.py

I believe you may be right about it being a server failure as it appears to be a network-level timeout rather than an API rejection. Is this a known issue with larger downloads? I successfully downloaded all N-PX filings from 2023, but I'm only able to download 2-3GB before the connection times out. For context, the 2023 filings were approximately 13GB of data.

@dgunning
Copy link
Owner

Due to the sheer number of http requests you will get ReadTimeouts.

For your use case you might be better using LocalStorage and downloading all filings. It takes about a minute to download all filings for a given day and it's 2 http requests per day

download_filings("2025-01-24")

# Download for a range
download_filings("2024-01-01:2024-12-31")

The drawback is that it downloads all forms so it requires a lot of storage. It occurred to me today that we could download the bulk files then use the Filings object to save only the filtered filings, so maybe we could add this feature

@jacob187
Copy link
Author

Thanks for the suggestion. However, this approach might be too resource-intensive for my hardware limitations.

I noticed there's a filing_date attribute in get_filings(), but I'm having trouble implementing it. I tried to download specific filings day by day using:

filing = get_filings(filing_date="2024-01-01", form="N-PX")
filing.attachments.download(path)

But I get AttributeError: 'NoneType' object has no attribute 'attachments'

I'm still learning the library and its classes. I'm beginning to understand some components, but the library is quite comprehensive. Appreciate the help!

@dgunning
Copy link
Owner

get_filings returns a Filings object containing multiple filings. Each Filing has attachments that can be downloaded.

The correct code is something like this
.

filings = get_filings(filing_date="2024-01-01", form="N-PX")
if filings is not None:
    for filing in filings:
        path = Path("base_path") / filing.accession_number
        filing.attachments.download(path)

There are no filings on 2024-01-01 (New Years Day) so that's why the None was returned.
Note that version 3.10.0 fixed an issue where None is returned if no Filings are found. Instead it will return an empty Filings object so you would not need the if filings is not None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants