Skip to content

jkrukoff/llists

Repository files navigation

llists

Copyright (c) 2022 John Krukoff

Version: 1.2.0

Authors: John Krukoff ([email protected]).

Modules

lfiles
llists
llists_utils

Overview

An Erlang/OTP library for working with iterators; an opaque type that is evaluated as a lazily evaluated list of elements. This allows for memory efficient processing of sequential data and easy composition of processing on such lazy lists.

An interface identical to the stdlib lists module is implemented, allowing for easy conversion between list processing code and lazy list processing code.

Lazy Construction Worker

Getting Started

This library is published to hex.pm as llists. If you're using rebar3 as your build tool, it can be added as a dependency to your rebar.config as follows:

{deps, [{llists}]}.

Usage

To use this library, it is first necessary to create an iterator. This can either be done from an existing data structure using functions like llists:from_list/1 or by expressing the continuation programmatically using functions like llists:unfold/2.

Once an iterator is constructed, it can then be evaluated an element at a time using llists:next/1 or by using one of the many utility and evaluation functions in llists or llists_utils.

llists contains an iterator aware version of every function in the stdlib lists module, with guaranteed identical behaviour for side effect free and finite iterators.

lfiles contains file utility functions replicating functionality from the file and io modules. Read functions create impure iterators that should only be evaluated once. Write functions consume iterators and write the produced data to a file.

Iterators

An iterator is an opaque record created by the llists module to represent a continuation. When evaluated by llists:next/1 an iterator returns a lazy list, represented by one of two options:

  • []: An empty list, signaling that no more elements are available.

  • [Elem | Iterator]: An improper list consisting of an element and an iterator to continue evaluation.

Examples

As an example task, let us calculate the mean difference between a list of integer pairs. These pairs are stored in a file as comma separated values, one per line. We can use the llists module to both read lines from the file and calculate the average lazily, thus avoiding loading the entire file into memory during computation.

First, we need to construct the iterator:

> {ok, File} = file:open("doc/example.txt", [read]).
{ok,<0.227.0>}
> I = llists:unfold(fun(File) ->
	case file:read_line(File) of
		{ok, Data} ->
			{Data, File};
		eof ->
			file:close(File),
			none
	end
end, File).
{iterator,#Fun<llists.2.38967554>}

Next, a loop to parse the strings and calculate the mean difference:

> F = fun Calculate(I, Sum, Count) ->
	case llists:next(I) of
		[] ->
			Sum / Count;
		[Elem | Next] ->
			[A, B] = [list_to_integer(string:trim(Part)) ||
				Part <- string:split(Elem, ",")],
			Calculate(Next, Sum + (A - B), Count + 1)
	end
end.
#Fun<erl_eval.17.3316493>
> F(I, 0, 0).
-0.42

We could also make use of the utility functions in llists and lfiles and compose the same result as follows:

> {ok, File} = file:open("doc/example.txt", [read]).
{ok,<0.227.0>}
> I = lfiles:read_line(File).
{iterator,#Fun<llists.2.38967554>}
> Split = llists:map(fun (Elem) ->
	string:split(Elem, ",")
end, I).
{iterator,#Fun<llists.23.38967554>}
> Integers = llists:map(fun (Parts) ->
	[list_to_integer(string:trim(Part)) || Part <- Parts]
end, Split).
{iterator,#Fun<llists.23.38967554>}
> {Sum, Count} = llists:foldl(fun ([A, B], {Sum, Count}) ->
	{Sum + (A - B), Count + 1}
end, {0, 0}, Integers).
{-42,100}
> file:close(File).
ok
> Sum / Count.
-0.42

In both examples, we read only a single line of the file into memory at a time.

Notice that we couldn't use llists:sum/1 and llists:length/1 here instead of llists:foldl/3, since our input iterator has side effects and can only be evaluated once.

Contributing

Please fork the repo and submit a PR. Tests are run automatically on the master branch by GitHub actions or can be run locally via:

make deps
make check

If a Unix environment is not available, tests can be run inside a docker container via:

docker-compose build
docker-compose run check

Documentation is autogenerated using edown and edoc via:

make doc

Development of the library should be done with an Erlang/OTP version of 25 or greater.

The library requires an Erlang/OTP version of 21 or greater to function. Earlier versions may work if functions involving maps are avoided.

Lineage

Streams and lazily evaluated lists are common language constructs and much prior art exists. Scheme's SRFI-41 served as a useful design document to base this work on.

Other implementations that were used for reference: