Get_filings method fails 403 Forbidden error when fetching N-PX filings. #208

jacob187 · 2025-01-24T22:05:01Z

Description

When attempting to download N-PX filings for 2023, the script encounters two sequential failures:

Initial attempt: httpx.ConnectionTimeout exception
Subsequent attempt: HTTP 403 Forbidden error

Reproduction Steps:

I ran the following python script:

from edgar import get_filings, set_identity

def download_npx_filings_year(year: int, path: str) -> None:
    filings = get_filings(year, form="N-PX")
    for filing in filings:
        filing.attachments.download(path)

if __name__ == "__main__":
    set_identity("Jacob Cohen [email protected]")
    path = "/path/to/save/files"
    download_npx_filings_year(2023, path)

Error Details

httpx.HTTPStatusError: Client error '403 Forbidden' for url 'https://www.sec.gov/Archives/edgar/full-index/2023/QTR1/form.gz.'

Environment

Python version: 3.11.9
Edgartools version: 3.8.4
Operating System: macOS Sequoia 15.2

Context and Questions

The set_identity() method appears to be properly configured.
The error occurs despite following rate limit.
Are there any known issues with N-PX downloads?
Are there any recommend workarounds or solutions?

The text was updated successfully, but these errors were encountered:

WaloupGarou · 2025-01-24T22:52:06Z

Hey,
It seems that set_identiy() method wait from you to give as argument an http "user-agent" header.
I had the same issue, but now the following command works
# Tell the SEC who you are set_identity("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:134.0) Gecko/20100101 Firefox/134.0")
You can try it as a workaround.

dgunning · 2025-01-25T01:24:09Z

Jacob, is the problem still occurring?

The initial timeout attempt followed by the HTTP 403 error seems like a transient backend server failure on SEC Edgar.

I don't think it has to do with identity which you say is set properly

jacob187 · 2025-01-25T16:33:45Z

I am no longer getting 403 errors, but I'm now encountering different HTTP errors related to the httpcore and httpx packages:

The initial error is httpcore.ReadTimeout
The second error is httpx.ReadTimeout

The error trace suggests it is happening in the network layer, in /python3.11/site-packages/httpcore/_async/http11.py

I believe you may be right about it being a server failure as it appears to be a network-level timeout rather than an API rejection. Is this a known issue with larger downloads? I successfully downloaded all N-PX filings from 2023, but I'm only able to download 2-3GB before the connection times out. For context, the 2023 filings were approximately 13GB of data.

dgunning · 2025-01-26T01:52:45Z

Due to the sheer number of http requests you will get ReadTimeouts.

For your use case you might be better using LocalStorage and downloading all filings. It takes about a minute to download all filings for a given day and it's 2 http requests per day

download_filings("2025-01-24")

# Download for a range
download_filings("2024-01-01:2024-12-31")

The drawback is that it downloads all forms so it requires a lot of storage. It occurred to me today that we could download the bulk files then use the Filings object to save only the filtered filings, so maybe we could add this feature

jacob187 · 2025-01-26T17:07:31Z

Thanks for the suggestion. However, this approach might be too resource-intensive for my hardware limitations.

I noticed there's a filing_date attribute in get_filings(), but I'm having trouble implementing it. I tried to download specific filings day by day using:

filing = get_filings(filing_date="2024-01-01", form="N-PX")
filing.attachments.download(path)

But I get AttributeError: 'NoneType' object has no attribute 'attachments'

I'm still learning the library and its classes. I'm beginning to understand some components, but the library is quite comprehensive. Appreciate the help!

dgunning · 2025-01-30T13:35:16Z

get_filings returns a Filings object containing multiple filings. Each Filing has attachments that can be downloaded.

The correct code is something like this
.

filings = get_filings(filing_date="2024-01-01", form="N-PX")
if filings is not None:
    for filing in filings:
        path = Path("base_path") / filing.accession_number
        filing.attachments.download(path)

There are no filings on 2024-01-01 (New Years Day) so that's why the None was returned.
Note that version 3.10.0 fixed an issue where None is returned if no Filings are found. Instead it will return an empty Filings object so you would not need the if filings is not None

dgunning closed this as completed Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get_filings method fails 403 Forbidden error when fetching N-PX filings. #208

Get_filings method fails 403 Forbidden error when fetching N-PX filings. #208

jacob187 commented Jan 24, 2025

WaloupGarou commented Jan 24, 2025

dgunning commented Jan 25, 2025

jacob187 commented Jan 25, 2025

dgunning commented Jan 26, 2025

jacob187 commented Jan 26, 2025

dgunning commented Jan 30, 2025

Get_filings method fails 403 Forbidden error when fetching N-PX filings. #208

Get_filings method fails 403 Forbidden error when fetching N-PX filings. #208

Comments

jacob187 commented Jan 24, 2025

Description

Reproduction Steps:

Error Details

Environment

Context and Questions

WaloupGarou commented Jan 24, 2025

dgunning commented Jan 25, 2025

jacob187 commented Jan 25, 2025

dgunning commented Jan 26, 2025

jacob187 commented Jan 26, 2025

dgunning commented Jan 30, 2025