Stok Footage

Continually experimenting with new ideas and techniques — Reconstructing, Developing, Modernising.

Fun Distractions

I often find that the “simplest” things can lead me down interesting paths, and being led down interesting paths can take more time than anticipated. This is an account of one of those things…

I have been a happy Mac user for a long time, and having spent more than twenty years using Unix in various flavours I often reach for my old standbys: shell scripts, makefiles, Perl programs, Ruby programs, C programs, and so on. Thanks to the Homebrew project lots of open source tools and languages are available for OS X, so the old habits are easy to hang on to.

The OS X ecosystem seems to encourage the development of nicely designed tools to help with workflows, and I’ve been using the Fujitsu ScanSnap iX500 and Hazel to deal with various statements for years: scan the paper copy, the ScanSnap OCRs it, Hazel files the PDF, and I shred the original.

As my banks and utilities finally got into the 21st century and offered PDF downloads of statements I just updated my Hazel rules to watch the downloads folder. For most downloaded bank statements and bills that worked fine, because there was a text layer in the downloaded PDF which Hazel could scan for details like the bill date, but one credit card provider’s PDF was not so easy to automate because it lacked a text layer — my lazy side did the obvious thing and I ignored it!

Originally I used Hazel to make sure that the statements were named and filed consistently, and it has been a godsend. I noticed that whenever I got a credit card statement or some bill then I would read it, note the due date and amount, then save it as PDF to let Hazel deal with it. I would then go to Fantastical 2 to add a reminder to pay the bill a week before it was due. Surely that tiresome mental date arithmetic and manual labour of creating reminders could be automated…

I started looking into applescript and automator as I’d need to create a reminder, and pretty quickly I came up with this which could be put into an embedded applescript script in Hazel:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# These would be detected by Hazel in the PDF
set new_balance to "123.45"
set due_date to "2015-08-31"

# This might be specific to a workflow, could be a parameter
# if we turn this into a function or script.
set payee to "Visa"

set list_name to "This Week"

set reminder_date to (date (due_date & " 8:00:00 AM")) - 7 * days
set reminder_name to "Pay " & payee & " $" & new_balance
set reminder_body to "Due on " & due_date

tell application "Reminders"
    tell list list_name
        make new reminder with properties {name:reminder_name, body:reminder_body, due date:reminder_date}
    end tell
end tell
end tell

The straightforward appearance of the job was soon changed. Although the credit card statement’s layout looked obvious to the human reader the order of the text in the PDF’s text layer was not what I’d expect, so I wasn’t able to set up a match rule in Hazel to pick out the new balance and the due date to feed into some applescript.

After doing some research my approach to this was to convert the original statement into images using ghostscript, OCR those images with tesseract, and then search the text files for the fragments I’m interested in. tesseract takes some time to process each page, so I wanted to do a little caching and be able to search from either end of the document to avoid OCR-ing unnecessary pages. As an old time Unix user I felt happier doing this processing from a bash script, calling out to ruby to parse the date I found. Once I had parsed out the information I was interested in then osascript was used to run an applescript script to create the reminder.

Now that I can extract text from one statement I should be able to go and attack the other statement which didn’t have any text layer at all, but that’s for another day.

The results of this are at https://github.com/mikestok/file-download-actions. The main logic of the bash script the and applescript are below.

1
2
3
4
5
6
7
8
9
10
11
12
13
source_file=${1:?No source file supplied}

PATH="$PATH:/usr/local/bin"
make_images "$source_file" "$tempdir/page_%04d.jpg"
due_date=$(find_due_date)
new_balance=$(find_new_balance)

if [ -n "$due_date" -a -n "$new_balance" ]; then
  osascript ~/Library/Scripts/make_payment_reminder.scpt \
    "Payment Reminders" "$due_date" "TD Visa $new_balance"
else
  exit 1
fi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
on run argv
    set list_name to item 1 of argv
    set due_date to item 2 of argv
    set reminder_name to item 3 of argv
   
    set reminder_date to (date (due_date & " 8:00:00 AM")) - 7 * days
    set reminder_body to "Due on " & due_date
   
    tell application "Reminders"
        tell list list_name
            make new reminder with properties {name:reminder_name, body:reminder_body, due date:reminder_date}
        end tell
    end tell
end run

Leave a Reply

Your email address will not be published. Required fields are marked *