Skip to content

Commit

Permalink
Merge pull request #45 from shonfeder/develop
Browse files Browse the repository at this point in the history
Release v1.0.0
  • Loading branch information
shonfeder committed Jun 23, 2019
2 parents 5fd98f7 + c45fb74 commit 0e647b6
Show file tree
Hide file tree
Showing 14 changed files with 734 additions and 158 deletions.
21 changes: 21 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
version: 2

jobs:
build:
docker:
- image: swipl:stable

steps:
- run:
# TODO Build custom image to improve build time
name: Install Deps
command: |
apt update -y
apt install git make -y
- checkout

- run:
name: Run tests
command: |
make test
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*~
44 changes: 44 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog][keep-a-change-log], and this project
adheres to [Semantic Versioning][semantic-versioning].

[keep-a-change-log]: https://keepachangelog.com/en/1.0.0/
[semantic-versioning]: https://semver.org/spec/v2.0.0.html

## [unreleased]

## [1.0.0]

### Added

- Support for numbers by [@Annipoo](https://github.com/Anniepoo) #34
- Support for strings #37
- Code of Conduct #23

### Changed

- Spaces are now tagged with `space` instead of `spc` #41
- Tokenization of numbers and strings is enabled by default #40
- Options are now processed by a more conventional means #39
- The location for the pack's home is updated

## [0.1.2]

Prior to changelog.

## [0.1.1]

Prior to changelog.

## [0.1.0]

Prior to changelog.

[unreleased]: https://github.com/shonfeder/tokenize/compare/v1.0.0...HEAD
[1.0.0]: https://github.com/shonfeder/tokenize/compare/v0.1.2...v1.0.0
[0.1.2]: https://github.com/shonfeder/tokenize/compare/v0.1.1...v0.1.2
[0.1.1]: https://github.com/shonfeder/tokenize/compare/v0.1.0...v0.1.1
[0.1.0]: https://github.com/shonfeder/tokenize/releases/tag/v0.1.0
62 changes: 58 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,74 @@ reports, etc.

## Code of Conduct

Please review and accept to our [code of conduct](CODE_OF_CONDUCT.md) prior to
Please review and accept our [code of conduct](CODE_OF_CONDUCT.md) prior to
engaging in the project.

## Overall direction and aims

Consult the [`design_notes.md`](design_notes.md) to see the latest codified
design philosophy and principles.

## Setting up Development

TODO
1. Install swi-prolog's [swipl](http://www.swi-prolog.org/download/stable).
- Optionally, you may wish to use [swivm](https://github.com/fnogatz/swivm) to
manage multiple installed versions of swi-prolog.
2. Hack on the source code in `[./prolog](./prolog)`.
3. Run and explore your changes by loading the file in `swipl` (or using your
editors IDE capabilities):
- Example in swipl

```prolog
# in ~/oss/tokenize on git:develop x [22:45:02]
$ cd ./prolog
# in ~/oss/tokenize/prolog on git:develop x [22:45:04]
$ swipl
Welcome to SWI-Prolog (threaded, 64 bits, version 8.0.2)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.
For online help and background, visit http://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).
% lod the tokenize module
?- [tokenize].
true.
% experiment
?- tokenize("Foo bar baz", Tokens).
Tokens = [word(foo), space(' '), word(bar), space(' '), word(baz)].
% reload the module when you make changes to the source code
?- make.
% Updating index for library /usr/local/Cellar/swi-prolog/8.0.2/libexec/lib/swipl/library/
true.
% finished
?- halt.
```

Please ask here or in `##prolog` on [freenode](https://freenode.net/) if you
need any help! :)

## Running tests

Tests are located in the [`./test`](./test) directory. To run the test suite,
simply execute the test file:
simply execute make test:

```sh
$ ./test/test.pl
$ make test
% PL-Unit: tokenize .. done
% All 2 tests passed
```

If inside the swipl repl, make sure to load the test file and query run_tests.

```prolog
?- [test/test].
?- run_tests.
% PL-Unit: tokenize .. done
% All 2 tests passed
true.
```
19 changes: 19 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
.PHONY: all test clean

version := $(shell swipl -q -s pack -g 'version(V),writeln(V)' -t halt)
packfile = quickcheck-$(version).tgz

SWIPL := swipl

all: test

version:
echo $(version)

check: test

install:
echo "(none)"

test:
@$(SWIPL) -s test/test.pl -g 'run_tests,halt(0)' -t 'halt(1)'
23 changes: 15 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,37 @@
# Synopsis
# `pack(tokenize) :-`

A modest tokenization library for SWI-Prolog, seeking a balance between
simplicity and flexibility.

[![CircleCI](https://circleci.com/gh/shonfeder/tokenize.svg?style=svg)](https://circleci.com/gh/shonfeder/tokenize)

## Synopsis

```prolog
?- tokenize(`\tExample Text.`, Tokens).
Tokens = [cntrl('\t'), word(example), spc(' '), spc(' '), word(text), punct('.')]
Tokens = [cntrl('\t'), word(example), space(' '), space(' '), word(text), punct('.')]
?- tokenize(`\tExample Text.`, Tokens, [cntrl(false), pack(true), cased(true)]).
Tokens = [word('Example', 1), spc(' ', 2), word('Text', 1), punct('.', 1)]
Tokens = [word('Example', 1), space(' ', 2), word('Text', 1), punct('.', 1)]
?- tokenize(`\tExample Text.`, Tokens), untokenize(Tokens, Text), format('~s~n', [Text]).
example text.
Tokens = [cntrl('\t'), word(example), spc(' '), spc(' '), word(text), punct('.')],
Text = [9, 101, 120, 97, 109, 112, 108, 101, 32|...]
Tokens = [cntrl('\t'), word(example), space(' '), space(' '), word(text), punct('.')],
Text = [9, 101, 120, 97, 109, 112, 108, 101, 32|...]
```

# Description
## Description

Module `tokenize` aims to provide a straightforward tool for tokenizing text into a simple format. It is the result of a learning exercise, and it is far from perfect. If there is sufficient interest from myself or anyone else, I'll try to improve it.

It is packaged as an SWI-Prolog pack, available [here](http://www.swi-prolog.org/pack/list?p=tokenize). Install it into your SWI-Prolog system with the query
It is packaged as an SWI-Prolog pack, available [here](http://www.swi-prolog.org/pack/list?p=tokenize). Install it into your SWI-Prolog system with the query

```prolog
?- pack_install(tokenize).
```

Please [visit the wiki](https://github.com/aBathologist/tokenize/wiki/tokenize.pl-options-and-examples) for more detailed instructions and examples, including a full list of options supported.

# Contributing
## Contributing

See [CONTRIBUTING.md](./CONTRIBUTING.md).
4 changes: 4 additions & 0 deletions comment-wip/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
WIP code towards tokenization of comments.

It was extracted here because it's not ready for release, but we want to keep it
available for the author to resume work on it.
115 changes: 115 additions & 0 deletions comment-wip/comment.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
:- module(comment,
[comment//2,
comment_rec//2,
comment_token//3,
comment_token_rec//3]).

/** <module> Tokenizing comments
This module defines matchers for comments used by the tokenize module. (Note
that we will use matcher as a name for dcg rules that match parts of the codes
list).
@author Stefan Israelsson Tampe
@license LGPL v2 or later
Interface Note:
Start and End matchers is a matcher (dcg rule) that is either evaluated with no
extra argument (--> call(StartMatcher)) and it will just match it's token or it
can have an extra argument producing the codes matched by the matcher e.g. used
as --> call(StartMatcher,MatchedCodes). The matchers match start and end codes
of the comment, the 2matcher type will represent these kinds of dcg rules or
matchers 2 is because they support two kinds of arguments to the dcg rules.
For examples
see:
@see tests/test_comments.pl
The matchers predicates exported and defined are:
comment(+Start:2matcher,+End:2matcher)
- anonymously match a non recursive comment
comment_rec(+Start:2matcher,+End:2matcher,2matcher)
- anonymously match a recursive comment
coment_token(+Start:2matcher,+End:2matcher,-Matched:list(codes))
- match an unrecursive comment outputs the matched sequence used
for building a resulting comment token
coment_token_rec(+Start:2matcher,+End:2matcher,-Matched:list(codes))
- match an recursive comment outputs the matched sequence used
for building a resulting comment token
*/



%% comment(+Start:2matcher,+End:2matcher)
% non recursive non tokenizing matcher

comment_body(E) --> call(E),!.
comment_body(E) --> [_],comment_body(E).

comment(S,E) -->
call(S),
comment_body(E).

%% comment_token(+Start:2matcher,+End:2matcher,-Matched:list(codes))
% non recursive tokenizing matcher

comment_body_token(E,Text) -->
call(E,HE),!,
{append(HE,[],Text)}.

comment_body_token(E,[X|L]) -->
[X],
comment_body_token(E,L).

comment_token(S,E,Text) -->
call(S,HS),
{append(HS,T,Text)},
comment_body_token(E,T).

%% comment_token_rec(+Start:2matcher,+End:2matcher,-Matched:list(codes))
% recursive tokenizing matcher

% Use this as the initial continuation, will just tidy up the matched result
% by ending the list with [].
comment_body_rec_start(_,_,[]).

comment_body_token_rec(_,E,Cont,Text) -->
call(E,HE),!,
{append(HE,T,Text)},
call(Cont,T).

comment_body_token_rec(S,E,Cont,Text) -->
call(S,HS),!,
{append(HS,T,Text)},
comment_body_token_rec(S,E,comment_body_token_rec(S,E,Cont),T).

comment_body_token_rec(S,E,Cont,[X|L]) -->
[X],
comment_body_token_rec(S,E,Cont,L).

comment_token_rec(S,E,Text) -->
call(S,HS),
{append(HS,T,Text)},
comment_body_token_rec(S,E,comment_body_rec_start,T).

%% comment_rec(+Start:2matcher,+End:2matcher)
% recursive non tokenizing matcher

comment_body_rec(_,E) -->
call(E),!.

comment_body_rec(S,E) -->
call(S),!,
comment_body_rec(S,E),
comment_body_rec(S,E).

comment_body_rec(S,E) -->
[_],
comment_body_rec(S,E).

comment_rec(S,E) -->
call(S),
comment_body_rec(S,E).
Loading

0 comments on commit 0e647b6

Please sign in to comment.