Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help pulling 10 years of income statements, balance sheets, and cash flow statements #111

Closed
unparadise opened this issue Sep 22, 2024 · 5 comments
Labels
question Further information is requested

Comments

@unparadise
Copy link

I am writing a script to pull 10 years of income statements, balance sheets, and cash flow statements based on a ticker parameter and encountered a few issues.

  1. I noticed that the dataframe created from financials.get_income_statement() has 4 empty rows on top. These are 'Income Statement [Abstract]', 'Statement [Table]', 'Product and Service', and 'Statement [Line Items]'. Below is my code.
company = Company('aapl')
ten_k = company.get_filings(form="10-K").latest(1).obj()
financials = ten_k.financials
income_statement_df = financials.get_income_statement().get_dataframe()
print(income_statement_df)

Should these empty rows be removed from the returned object?

  1. When I tried to pull 10 years of income statement of MSFT, I encountered an error that says 'ValueError: Length mismatch: Expected axis has 21 elements, new values have 22 elements'. Below is my code.
def get_financial_statements(ticker):
    year = 10
    set_identiy("blah blah [email protected]")
    company = Company('MSFT')

    def get_income_statements():
        ten_ks = company.get_filings(form="10-K").latest(year)
        income_statement_df = pd.DataFrame()

        income_statement_df = income_statement_df.iloc[:, :-2]

        i = 0
        for ten_k in ten_ks:
            financials = ten_k.obj().financials
            income _statement_df = financials.get_income_statement().get_dataframe()
            if (i == 0):
                income_statement_df = pd.concat([income_statement_df, income_statement_df[income_statement_df.columns]], axis=1)
        i = i + 1
        print(income_statement_df)

    get_income_statements()

def main():
    get_financial_statements('aapl')

if __name__ = '__main__':
    main()

The full error message is pasted below.

Traceback (most recent call last):
  File "/Users/liangchen/Github/coding/stocks/get_fs_SEC.py", line 147, in <module>
    main()
  File "/Users/liangchen/Github/coding/stocks/get_fs_SEC.py", line 143, in main
    get_financial_statements(args.ticker, statement)
  File "/Users/liangchen/Github/coding/stocks/get_fs_SEC.py", line 54, in get_financial_statements
    get_income_statements()
  File "/Users/liangchen/Github/coding/stocks/get_fs_SEC.py", line 45, in get_income_statements
    income_statements_df.index=income_statement_df.index
    ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pandas/core/generic.py", line 6313, in __setattr__
    return object.__setattr__(self, name, value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pandas/core/generic.py", line 814, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 238, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pandas/core/internals/base.py", line 98, in _validate_set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 21 elements, new values have 22 elements

Thank you for your help in advance!
@dgunning
Copy link
Owner

So I actually implemented this last week but have been slow rolling it out.

from edgar import *

company = Company("MSFT")
filings = company.get_filings(form="10-K").latest(9)

financials = MultiFinancials(filings)
financials.get_balance_sheet()
financials.get_cash_flow_statement()
financials.get_income_statement()

@unparadise
Copy link
Author

Thank you for your prompt reply, dgunning! I tried it and it works. Thank you!

But I noticed that values are incorrect. For example, when pulling the 10 years income statements for MSFT, the revenue row shows <N/A> for 2015 and 2014.

image

Also, I noticed that many details such as R&D expense, SG&A expense are missing from the dataframe. Is this by design?

image

@Colem19
Copy link

Colem19 commented Sep 24, 2024

I think it has to do with the contactenation of the rows/dataframe. This might be due to lines having slightly different text from one year to another. There might be a way to do this by removing the index for each year and creating a new one. The rows order is kind of important in this process so I wonder what would be the best way to do it.

When I do Excel models, I actually add some rows as the accounts are used and kind of always kept them even if they were removed in later years. There might be some work around by doing some kind of sumif or using some specific mapping to map out the account and refer to the mapping. But it might be hard with all the different ways financials are presented.

@dgunning dgunning added the question Further information is requested label Oct 2, 2024
@dgunning
Copy link
Owner

dgunning commented Jan 5, 2025

This will be hard to fix with the current algorithmn. It works by starting with the specific rows that are in the most recent statement e.g. gaap_RevenueFromContractWithCustomer and looking for those values in earlier filings. If the item is not there then it will show as NA.

Probably the way to fix this is with semantic joins using AI. I will play around with this and maybe write a howto.
But likely won't fix the current algorithm

@dgunning dgunning closed this as completed Jan 5, 2025
@Colem19
Copy link

Colem19 commented Jan 6, 2025

I guess that it might be helpful to have the full statements for each years with the labels and years for each statement showing. In this way, there could be a manual fix easily done.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants