Skip to content

An hledger/ledger-cli workflow focusing on automated statement import and classification

License

Notifications You must be signed in to change notification settings

apauley/hledger-flow

Repository files navigation

Hledger Flow

TravisCICircleCI
https://travis-ci.com/apauley/hledger-flow.svg?branch=masterhttps://circleci.com/gh/apauley/hledger-flow.svg?style=svg

What is it?

hledger-flow is a command-line program that gives you a guided Hledger workflow. It is important to note that most of the heavy lifting is done by the upstream hledger project. For example, hledger-flow cares about where you put your files for long-term maintainability, but the actual conversion to classified accounting journals is done by hledger.

hledger-flow focuses on automated processing of electronic statements as much as possible, as opposed to manually adding your own hledger journal entries. Manual entries are still possible, it just saves time in the long run to automatically process a statement whenever one is available.

Within hledger-flow you will keep your original bank statements around permanently as input, and generate classified Hledger journals each time you run the program. The classification is done with hledger’s rules files, and/or your own script hooks.

Keeping the original statements means that you never have to worry too much about “am I doing this accounting thing right?” or “what happens if I make a mistake?”. If you want to change your mind about some classification, or if you made a mistake, you just change your classification rules, and run the program again.

It started when I realized that the scripts I wrote while playing around with adept’s Full-fledged Hledger aren’t really specific to my own finances, and can be shared.

Overview of the Basic Workflow

  1. Save an input transaction file (typically CSV) to a specific directory.
  2. Add an hledger rules file. Include some classification rules if you want.
  3. Run hledger-flow import

Add all your files to your favourite version control system.

The generated journal that you most likely want to use as your LEDGER_FILE is called all-years.journal. This has include directives to all the automatically imported journals, as well as includes for your own manually managed journal entries.

In a typical software project we don’t add generated files to version control, but in this case I think it is a good idea to add all the generated files to version control as well - when you inevitably change something, e.g. how you classify transactions in your rules file, then you can easily see if your change had the desired effect by looking at a diff.

Who should use this?

hledger-flow is intended for you if:

  • You are interested in getting started with hledger or ledger-cli and you wouldn’t mind pointers to the right docs along the way.
  • You want a way to organise your finances into a structure that will be maintainable over the long term.
  • You want to automate as much as possible when dealing with your financial life.
  • You don’t mind writing some scripts when needed, as long as it saves you time over the long term.
  • You want the ability to model your entire financial life in one tool, as opposed to just the parts that some online tool currently supports.
  • You appreciate the fact that all your financial information stays within your control.

About This Documentation

The rest of this file documents how to use hledger-flow, and it has probably outgrown what should be in a README file.

If you can spare some time to contribute to this project, please consider converting these docs to something more suitable, such as Read the Docs.

How do I install it?

The easiest way to get it running is to download the latest release for your OS (Linux or Mac OS X), and copy the hledger-flow executable to a directory in your PATH. Then just run it and see what it tells you to do.

You can also compile it yourself by following the build instructions.

Windows Support

Currently hledger-flow does not work on Windows.

This list of issues describes some of the details of what doesn’t work.

I believe it wouldn’t take too much effort to fix those issues, but I’m going to leave Windows support for other contributors.

Please send me some pull requests if you would like hledger-flow to work on Windows.

How Stable is it?

We’re not close to a 1.0 release yet, which means that we can still make changes if needed.

As an example, the command-line switches we use will probably change over time. Some switches change the behaviour of the program - the default behaviour will probably change between releases. The names of these command-line options can change, or they can be removed when it is no longer needed.

That being said, some parts have been used and tested extensively and are likely to remain stable. Have a look at the “Stability of this Feature” sections in the feature reference below.

I add future work, ideas and thoughts as Github issues and in TODO.org, so have a look there for more clues as to what may likely change.

Let me know if you can think of some improvements.

Getting Started

Have a look at the detailed step-by-step instructions.

You can see the example imported financial transactions as it was generated by the step-by-step instructions here:

https://github.com/apauley/hledger-flow-example

Feature Reference

Input Files

Your input files will probably be CSV files with a line for each transaction, although other file types will work fine if you use a preprocess or a construct script that can read them. These scripts are explained later.

We mostly use conventions based on a predefined directory structure for your input statements.

For example, assuming you have a savings account at mybank, you’ll put your first CSV statement here: import/john/mybank/savings/1-in/2018/123456789_2018-06-30.csv.

Some people may want to include accounts belonging to their spouse as part of the household finances: import/spouse/otherbank/checking/1-in/2018/987654321_2018-06-30.csv.

More About Input Files

All files and directories under the import directory are related to the automatic importing and classification of transactions.

The directory directly under import is meant to indicate the owner or custodian of the accounts below it. It mostly has an impact on reporting. You may want to have separate reports for import/mycompany and import/personal.

Below the directory for the owner we can indicate where an account is held. For a bank account you may choose to name it import/john/mybank.

If your underground bunker filled with gold has CSV statements linked to it, then you can absolutely create import/john/secret-treasure-room.

Under the directory for the financial institution, you’ll have a directory for each account at that institution, e.g. import/mycompany/bigbankinc/customer-deposits and import/mycompany/bigbankinc/expense-account.

Next you’ll create a directory named 1-in. This is to distinguish it from 2-preprocessed and 3-journal which will be auto-generated later.

Under 1-in you’ll create a directory for the year, e.g. 2018, and within that you can copy the statements for that year: import/john/mybank/savings/1-in/2018/123456789_2018-06-30.csv

Stability of this Feature

The basic owner/bank/account/year structure has been used and tested fairly extensively, I don’t expect a need for it to change.

I’m open to suggestions for improvement though.

Rules Files

If your input file is in CSV format, or converted to CSV by your preprocess script, then you’ll need an hledger rules file.

hledger-flow will try to find a rules file for each statement in a few places. The same rules file is typically used for all statements of a specific account, or even for all accounts of the same specific bank.

  • A global rules file for any mybank statement can be saved here: import/mybank.rules
  • A rules file for all statements of a specific account: import/spouse/bigbankinc/savings/bigbankinc-savings.rules

Statement-specific Rules Files

What happens if some of the statements for an account has a different format than the others?

This can happen if you normally get your statements directly from your bank, but some statements you had to download from somewhere else, like Mint, because your bank is being daft with older statements.

In order to tell hledger-flow that you want to override the rules file for a specific statement, you need to add a suffix, separated by an underscore (_) and starting with the letters rfo (rules file override) to the filename of that statement.

For example: assuming you’ve named your statement 99966633_20171223_1844_rfo-mint.csv.

hledger-flow will look for a rules file named rfo-mint.rules in the following places:

  • in the import directory, e.g. import/rfo-mint.rules
  • in the bank directory, e.g. import/john/mybank/rfo-mint.rules
  • in the account directory, e.g. import/john/mybank/savings/rfo-mint.rules

Example rules file usage

A common scenario is multiple accounts that share the same file format, but have different account1 directives.

One possible approach would be to include a shared rules file in your account-specific rules file.

If you are lucky enough that all statements at mybank share a common format across all accounts, then you can include a rules file that just defines the parts that are shared across accounts.

Two accounts at mybank may have rules files similar to these.

A checking account at mybank:

# Saved as: import/john/mybank/checking/mybank-checking.rules
include ../../../mybank-shared.rules
account1 Assets:Current:John:MyBank:Checking

Another account at mybank:

# Saved as: import/alice/mybank/savings/mybank-savings.rules
include ../../../mybank-shared.rules
account1 Assets:Current:Alice:MyBank:Savings

Where import/mybank-shared.rules may define some shared attributes:

skip 1

fields date, description, amount, balance

date-format %Y-%m-%d
currency $

Another possible approach could be to use your preprocess script to write out a CSV file that has extra fields for account1 and account2.

You could then create the above mentioned global import/mybank.rules with the fields defined more or less like this:

fields date, description, amount, balance, account1, account2

Stability of this Feature

Rules files are a stable feature within hledger, and we’re just using the normal hledger rules files. The account, bank and statement-specific rules files have been used and tested fairly extensively, I don’t expect this to change.

Let me know if you think it should change.

Opening and Closing Balances

Opening Balances

hledger-flow looks for a file named YEAR-opening.journal in each account directory, where YEAR corresponds to an actual year directory, eg. 1983 (if you have electronic statements dating back to 1983). Example: import/john/mybank/savings/1983-opening.journal

If it exists the file will automatically be included at the beginning of the generated journal include file for that year.

You need to edit this file for each account to specify the opening balance at the date of the first available transaction.

An opening balance may look something like this:

2018-06-01 Savings Account Opening Balance
    assets:Current:MyBank:Savings               $102.01
    equity:Opening Balances:MyBank:Savings

A Note of Caution Regarding Closing Balances

When closing your balances it may result in some hledger queries showing zero-values, or there could be issues with balance assertions.

Please have a look at the upstream hledger documentation on closing balances, e.g here: https://hledger.org/hledger.html#close-usage

Some of the gotchas you may run into are also described in this hledger-flow issue.

Closing Balances

Similar to opening balances, hledger-flow looks for an optional file named YEAR-closing.journal in each account directory. Example: import/john/mybank/savings/1983-closing.journal

If it exists the file will automatically be included at the end of the generated journal include file for that year.

A closing balance may look something like this:

2018-06-01 Savings Account Closing Balance
    assets:Current:MyBank:Savings               $-234.56 = $0.00
    equity:Closing Balances:MyBank:Savings

Example Opening and Closing Journal Files

As an example, assuming that the relevant year is 2019 and hledger-flow is about to generate import/john/mybank/savings/2019-include.journal, then one or both of the following files will be added to the include file if they exist:

  1. import/john/mybank/savings/2019-opening.journal
  2. import/john/mybank/savings/2019-closing.journal

The opening.journal will be included just before the other included entries, while the closing.journal will be included just after the other entries in that include file.

An include file may look like this:

cat import/john/mybank/savings/2019-include.journal
### Generated by hledger-flow - DO NOT EDIT ###

include 2019-opening.journal
include 3-journal/2019/123456789_2019-01-30
include 2019-closing.journal

Stability of this Feature

Closing balances sometimes result in unexpected query results. In future we may change how/where the generated files include the closing journal.

We may also need to suggest some naming conventions for opening and closing balances so that reports can exclude some of these transactions.

It is also possible that we might want to change the name/location of the closing journal, but we’ll try to avoid this if possible, because that would require users to rename their existing files.

Price Files

hledger-flow looks for price files to include in each yearly include file.

For example, the presence of a file named ${BASE}/prices/2020/prices.journal will result in some extra include file magic.

The rest of this section assumes you’ll have a file named prices/2020/prices.journal which contains price data for the year 2020. The prices directory should be right at the top of your hledger-flow base directory, next to the import directory.

hledger-flow does not care how the price files got there, it only cares that you should have a separate file per year, and that it follows the above naming convention.

Here is an example script which downloads prices and follows the naming convention: https://gist.github.com/apauley/398fa031c202733959af76b3b8ce8197

After running an import with available price files you’ll see a line has been added to import/2020-include.journal:

include ../prices/2020/prices.journal

Hledger Directives

Hledger allows you to specify some useful directives which affect things such as number formatting.

A convenient place to put these directives within hledger-flow is a file named directives.journal (in your hledger-flow base directory).

If it exists hledger-flow will include it within the all-years.journal:

cat all-years.journal
### Generated by hledger-flow - DO NOT EDIT ###

include directives.journal
include import/all-years.journal

The preprocess Script

Sometimes the statements you get from your bank is less than suitable for automatic processing. Or maybe you just want to make it easier for the hledger rules file to do its thing by adding some useful columns.

If you put a script called preprocess in the account directory, e.g. import/john/mybank/savings/preprocess, then hledger-flow will call that script for each input statement.

The preprocess script will be called with 4 positional parameters:

  1. The path to the input statement, e.g. import/john/mybank/savings/1-in/2018/123456789_2018-06-30.csv
  2. The path to an output file that can be sent to hledger, e.g. import/john/mybank/savings/2-preprocessed/2018/123456789_2018-06-30.csv
  3. The name of the bank, e.g. mybank
  4. The name of the account, e.g. savings
  5. The name of the owner, e.g. john

Your preprocess script is expected to:

  • read the input file
  • write a new output file at the supplied path that works with your rules file
  • be idempotent. Running preprocess multiple times on the same files will produce the same result.

Stability of this Feature

Stable and tested.

The construct Script

If you need even more power and flexibility than what you can get from the preprocess script and hledger’s CSV import functionality, then you can create your own custom script to construct transactions exactly as you need them.

At the expense of more construction work for you, of course.

The construct script can be used in addition to the preprocess script, or on it’s own. But since the construct script is more powerful than the preprocess script, you could tell your construct script to do anything that the preprocess script would have done.

Save your construct script in the account directory, e.g. import/john/mybank/savings/construct.

hledger-flow will call your construct script with 5 positional parameters:

  1. The path to the input statement, e.g. import/john/mybank/savings/1-in/2018/123456789_2018-06-30.csv
  2. A “-” (indicating that output should be sent to stdout)
  3. The name of the bank, e.g. mybank
  4. The name of the account, e.g. savings
  5. The name of the owner, e.g. john

Your construct script is expected to:

  • read the input file
  • generate your own hledger journal transactions
  • be idempotent. Running construct multiple times on the same files should produce the same result.
  • send all journals to stdout. hledger-flow will pipe your standard output into hledger which will format it and save it to an output file.

You can still use stderr in your construct script for any other output that you may want to see.

Stability of this Feature

Stable and tested.

Manually Managed Journals

Not every transaction in your life comes with CSV statements.

Sometimes you just need to add a transaction for that time you loaned a friend some money.

hledger-flow looks for pre-import and post-import files related to each generated include file as part of the import.

You can enter your own transactions manually into these files.

You can run hledger-flow import --verbose to see exactly which files are being looked for.

As an example, assuming that the relevant year is 2019 and hledger-flow is about to generate import/john/2019-include.journal, then one or both of the following files will be added to the include file if they exist:

  1. import/john/_manual_/2019/pre-import.journal
  2. import/john/_manual_/2019/post-import.journal

The pre-import.journal will be included just before the other included entries, while the post-import.journal will be included just after the other entries in that include file.

An include file may look like this:

cat import/john/2019-include.journal
### Generated by hledger-flow - DO NOT EDIT ###

include _manual_/2019/pre-import.journal
include mybank/2019-include.journal
include otherbank/2019-include.journal
include _manual_/2019/post-import.journal

Stability of this Feature

It works, but the naming of _manual_ looks a bit weird. Should it be changed?

Validating an hledger repository using Github Actions

The following example was contributed by Amitai Burstein:

# .github/workflows/hledger-flow.yml

name: Validate hledger-flow

on: [push]

jobs:
  build:

    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v1

    - name: Install hledger
      run: docker pull dastapov/hledger

    - name: Install hledger-flow
      run: curl -L https://github.com/apauley/hledger-flow/releases/download/v0.12.4.0/hledger-flow_Linux_x86_64_v0.12.4.0_4b9b027.tar.gz | tar xvz && mv hledger-flow_Linux_x86_64_v0.12.4.0_4b9b027/hledger-flow .

    - name: Grant permissions to create files
      run: chmod 777 -R ./my-finances

    - name: Test hledger file
      run: docker run --name="ledger" -v $(pwd):/data dastapov/hledger ./hledger-flow import ./my-finances

Compatibility with Ledger

When writing out the journal include files, hledger-flow sorts the include statements by filename.

Ledger fails any balance assertions when the transactions aren’t included in chronological order.

An easy way around this is to name your input files so that March’s statement is listed before December’s statement.

Another option is to add --permissive to any ledger command.

So you should easily be able to use both ledger and hledger on these journals if you take care to avoid the few incompatibilities which exists (eg in your rules files or manual journals).

Project Goals

My hledger files started to collect a bunch of supporting code that weren’t really specific to my financial situation.

I want to extract and share as much as possible of that supporting code.

Adept’s goals also resonated with me:

  • Tracking expenses should take as little time, effort and manual work as possible
  • Eventual consistency should be achievable: even if I can’t record something precisely right now, maybe I would be able to do it later, so I should be able to leave things half-done and pick them up later
  • Ability to refactor is a must. I want to be able to go back and change the way I am doing things, with as little effort as possible and without fear of irrevocably breaking things.

I’ve given a talk at Lambda Luminaries Johannesburg featuring hledger and hledger-flow.

Contributing to Hledger Flow

Have a look at the contribution guidelines.

FAQ

How do you balance transfers between 2 accounts when you have statements for both accounts?

The Problem

In your primary bank account you’ve happily been classifying transfers to a secondary account as just Expenses:OtherAccount.

But you’ve recently started processing the statements from the second account as well so that you can classify those expenses more accurately.

And now the balances of these two accounts are all wrong when the statements of each account deals with money transferred between these two accounts.

In bank1.journal, imported from bank1.csv:

2018/11/09 Transfer from primary account to secondary account
    Assets:Bank1:Primary    $-200
    Assets:Bank2:Secondary

In bank2.journal, imported from bank2.csv:

2018/11/09 Transfer from primary account to secondary account
    Assets:Bank2:Secondary  $200
    Assets:Bank1:Primary

The Solution

As soon as you start importing statements for both accounts you will have to introduce an intermediate account for classification between these two accounts.

I use Assets:Transfers:*.

And we may have reports looking at these transfers accounts at some point, you should consider using the same names.

The above example then becomes as follows.

In bank1.journal, imported from bank1.csv:

2019-05-18 Transfer from primary account to secondary account
    Assets:Bank1:Primary         $-200
    Assets:Transfers:Bank1Bank2

In bank2.journal, imported from bank2.csv:

2019-05-18 Transfer from primary account to secondary account
    Assets:Bank2:Secondary       $200
    Assets:Transfers:Bank1Bank2

Any posting to Assets:Transfers:* indicates an in “in-flight” amount. You would expect the balance of Assets:Transfers to be zero most of the time. Whenever it isn’t zero it means that you either don’t yet have the other side of the transfer, or that something is wrong in your rules.

You could theoretically just use Assets:Transfers without any subaccounts, but I found it useful to use subaccounts. Because then the subaccounts can show me where I should look for any missing transfer transaction.

I typically use sorted names as the subaccount (Python code sample):

"Assets:Transfers:" + "".join(sorted(["Bank2", "Bank1"]))

External references

This approach is based on what is described in Full-fledged hledger: https://github.com/adept/full-fledged-hledger/wiki/Adding-more-accounts#lets-make-sure-that-transfers-are-not-double-counted

The question was first asked in issue #51.

How does hledger-flow differ from Full-fledged Hledger?

Full-fledged Hledger is a brilliant system, and hledger-flow continues to learn much from it.

It has great documentation that does an excellent job of not only showing how things can be done, but also why it is such a great idea.

hledger-flow can be seen as a specific implementation of the Full-fledged Hledger system, with a few implementation details that are different.

Full-fledged HledgerHledger Flow
FFH is a tutorial with helper scripts that you can start using and adapt to your needs.I started with FFH, and changed bits and pieces over time to suit my needs. The “owner/bank/account” structure for example.
FFH is more open-ended: you can start with the basic scripts and over time turn it into something that solves your needs exactly. But you’ll also end up with more code that you need to maintain yourself.Hledger Flow is more opinionated and less open-ended. For example, you have to adopt the “owner/bank/account” structure precisely as specified. But this allows Hledger Flow to do more work for you.
FFH uses scripts and Haskell/Shake build files that you can easily modify as you go along, but this requires a Haskell runtime to be installed everywhere it needs to run. The included docker image helps to make it less of an issue.Hledger Flow distributes a compiled binary. This means users or deployment targets don’t need extra dependencies installed, they can just run a CLI program. This also provides a clearer distinction between what is provided, and what users need to do.