Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Donating a library for reasoning about CBOR #2800

Open
Kiyoshi364 opened this issue Jan 30, 2025 · 16 comments
Open

Donating a library for reasoning about CBOR #2800

Kiyoshi364 opened this issue Jan 30, 2025 · 16 comments

Comments

@Kiyoshi364
Copy link

Hi, I would like to donate a library for Scryer Prolog.
Namely, the library is Kiyoshi364/cbor-pl.
(Here is the fixed link for future preservation reasons: fixed link)
The library itself still needs some tweaks.
I will ask general questions about "donating libraries", then I will briefly talk about the library itself.

General Library Donation Questions

  • (G1) Is Scryer Prolog accepting library donation? If so, what is the procedure?
  • (G2) Which licenses does Scryer Prolog accept? Is there any suggestions?

About cbor-pl

It is a library for reason about CBOR (RFC8949).
CBOR is a binary representation format for structured data, supporting "JSON-like" objects.

The library establishes a prolog "datastructure" for representing a CBOR Item (cbor/1) and
provides a DCG cbor_item//1 for describing the binary representation for the CBOR Item.

Specific notes and remarks about cbor-pl

  • (S1) It depends on library(clpz) to support reading/decoding and writing/encoding using the same DCG.
    Using the same DCG for reading and writing is good for many declarative reasons,
    but I consider CLPZ a big dependency, so it is worth noting that.

  • (S2) It does not integrate well with library(pio).
    The DGC cbor_item//1 describes lists of bytes [1] (numbers in the range [0, 255] or [0x00, 0xFF]) and
    the library(pio) works with lists of chars (single letter atoms).

  • (S3) It does not have good support for floating point numbers.
    I (the implementor) do not know how to reason about floating point numbers and their binary representation inside Prolog.
    Currently, the library only supports the binary representation.
    There is an easy patch (given that the reasoning is done): change the implementation for size_value_float/3.
    Its current implementation is similar to size_value_float(S, V, F) :- S = _, V = F.:

    1. S is an atom to identity the float size (16, 32 or 64 bits)
    2. V is the binary representation of the float
    3. F is the float represented in prolog form (examples: 1.3, 0.123)
  • (S4) There are 2 "NOTE" comments in the implementation.
    They may configure bugs on the interpreter or library (such as CLPZ). If so, I should make separate issues for them.

    1. The first NOTE, is about a { true } I inserted inside a DCG clause. The { true } magically makes the test pass.
      I imagine it has something to do with tail-call/last-call optimization.
    2. Near the second NOTE, I have to use #R #= #X * (2 ^ #I) instead of #R #= #X << #I, because for big numbers the shift makes R negative.
      It appears to work in some directions, but not in others.
  • (S5) Currently, I have only documented the prolog "datastructure", I still intend to add documentation for behavior of cbor_item//1.
    Scryer Prolog has some way of turning a prolog file into a html file (to make library pages for the website).
    If my documentation does not work well with it, please let me know.

Footnotes:
[1]: I'm considering a byte is similar to a code (see get_code/1, put_code/1, ...). Possibly they are the same, I'm not sure.

@adri326
Copy link
Contributor

adri326 commented Jan 30, 2025

For point S4.ii, this was recently fixed in #2777, as I had also noticed this incorrect behavior while refactoring the code around arithmetic operators. The condition for the bug was that the lhs had to be represented as a i64 (rather than a bigint) and the rhs had to try to make it bigger than 2 ^ 63 - 1.

S4.i should become an issue on its own.

I don't have a say in these kinds of decisions, but I will point out that having more code in any project (and especially a programming language) usually comes at the cost of:

  • Having to maintain it: what if it breaks with changes inside of scryer-prolog? What if a bug is found in the library?
  • Having to fossilize it: any public-facing code you publish now can't easily be amended later without breaking existing user code. It's not as big of a deal for free-standing libraries, as their version can simply be frozen, but freezing scryer-prolog's version would lock one out of bugfixes or performance optimizations.

@UWN
Copy link

UWN commented Jan 30, 2025

... et dona ferrentes. In any case it seems that some aspects would be best issued as separate items.

@triska
Copy link
Contributor

triska commented Jan 30, 2025

a byte is similar to a code

Regarding this point: Indeed bytes and codes are both integers. More precisely, a byte is:

3.22 byte: An integer in the range [0..255] (see 7.1.2.1).

For comparison, a code is an integer from the collating sequence:

3.34 collating sequence: An implementation defined ordering defined on the set C of characters (see 6.6).

For instance, in Scryer Prolog, the code of 海 is 28023 (the character's Unicode code point), well outside the range of a byte:

?- char_code(海, Code).
   Code = 28023.

The most compact way to represent sequences of bytes in Scryer Prolog is to use lists of characters with codes in the range 0..255. You can specify the option type(binary) to phrase_from_file/3 to obtain this representation, i.e., to interpret the file contents as a sequence of bytes instead of UTF-8-encoded Unicode code points. When required, you can use char_code/2 as above to inspect such a character's "byte value", i.e., in this case its code.

@hurufu
Copy link
Contributor

hurufu commented Jan 31, 2025

Very cool, thanks for sharing. Do you also plan to support CDDL and some special tags from https://www.iana.org/assignments/cbor-tags/cbor-tags.xml ?

@Kiyoshi364
Copy link
Author

About the "gift may actually be a burden".
Sure, it makes sense.
I will do some advancements into releasing the library by itself.
If, in the future, Scryer Prolog thinks it is a good idea to have this library, just let me know.


(S4.i)
I will try to remove code, make a smaller example and make a separate issue.


You can specify the option type(binary) to phrase_from_file/3 to obtain this representation, i.e., to interpret the file contents as a sequence of bytes instead of UTF-8-encoded Unicode code points. When required, you can use char_code/2 as above to inspect such a character's "byte value", i.e., in this case its code.

Thanks for the entire explanation.
Thanks again for the quoted part.
How do I dereference the numbers (7.1.2.1)?
Are the numbers a section from a book?


Unfortunately, there is no plan for CDDL support.
Specially because I have not read it yet.
For now, I think one needs to call a special predicate to check/validate whether it is in the language (some predicate similar to cbor/1.

If I were to add support to CDDL, I would add an extra argument to cbor_item//1 with the description of the language.
This extra argument, could also be a in a "Options argument", like in open/4.
(I'm not experienced with writing predicates with "Options argument")

For tags and simple values, what I could do is to support a known_tags(D) and known_simples(D) options.
Where D is a "dictionary" (list of pairs, ie, [20-false, 21-true]).
But I have no plan for treating special cases.
For instance, seeing a tag 0 (String datetime) with a valid datetime string and return to the user some pretty datetime structure.

I intend it to be a simple library for quickly lifting one out of raw bytes without losing any possible representation.
The library allows one to represent 0 in all 5 ways and it also allows one to read/write not well-formed cbor items.
I see the library as a good foundation for building a higher-level cbor encoder/decoder.

(Also, my vacation time is short and I am attempting to keep projects' scope small)


I am happy enough for closing the issue,
but please wait for a response on how to dereference triska's numbers.

Another option is to wait for me to open an issue about (S4.i).

@triska
Copy link
Contributor

triska commented Jan 31, 2025

Are the numbers a section from a book?

The texts I quoted are from the Prolog ISO standard, and the numbers refer to sections in the standard, such as:

7.1.2.1 Bytes

B, a set of bytes, is a subset of I where:

B = {i ∈ I | 0 ≤ i ≤ 255}

closing the issue

Please do leave the issue open, since it is not yet resolved and may be interesting also for other contributors, to study and use the code and functionality. Please do consider adapting the tile to (for example): "Donating a library for reasoning about CBOR" so that it is easier to find in the future. @mthom: May I also suggest to tag this issue with a new tag that is adequate for issues like this one, for example library-proposal.

@Kiyoshi364 Kiyoshi364 changed the title Donating a library Donating a library for reasoning about CBOR Jan 31, 2025
@hurufu
Copy link
Contributor

hurufu commented Jan 31, 2025

I'm repeating myself, but I think it would be very beneficial if there would be a public repository of ISO Prolog modules compatible with Scryer Prolog, with relaxed submission quality requirements, so new libraries wouldn't stuck in never-ending reviews. And mature libraries can be promoted to Scryer's main repo.

@infradig
Copy link

infradig commented Feb 1, 2025

Are the numbers a section from a book?

The texts I quoted are from the Prolog ISO standard, and the numbers refer to sections in the standard, such as:

But be sure to shop around!

@UWN
Copy link

UWN commented Feb 2, 2025

the Prolog ISO standard

See how to get it.

@Kiyoshi364
Copy link
Author

Kiyoshi364 commented Feb 3, 2025

I am trying out this idea:

The most compact way to represent sequences of bytes in Scryer Prolog is to use lists of characters with codes in the range 0..255. You can specify the option type(binary) to phrase_from_file/3 to obtain this representation, i.e., to interpret the file contents as a sequence of bytes instead of UTF-8-encoded Unicode code points. When required, you can use char_code/2 as above to inspect such a character's "byte value", i.e., in this case its code.

It does not work for this library, because char_code/2 expects at least one of the arguments to be instantiated.
This library uses the same DGC for "reading" and "writting" to lists,
because of that,
I wrote the library in a way which it needs both arguments of char_code/2 to be not instantiated.

I believe that a declarative version of char_code/2 makes it possible (one with delayed unification/attributed variables).
EDIT: freeze/2 may be helpful

@UWN
Copy link

UWN commented Feb 3, 2025

a declarative version of char_code/2

Please note that char_code/2 is declarative. It perfectly behaves like a real relation, as long as it does not produce an error. It is moded, such that certain uses are not possible. Quite in contrast to non-relational code that runs entirely unprotected.

@Kiyoshi364
Copy link
Author

Sorry, I misused the word "declarative".

I meant something like #X #= #Y + 1 from library(clpz),
where if I don't know neither X or Y, a constrain will be added for a latter unification.

@UWN
Copy link

UWN commented Feb 3, 2025

something like #X #= #Y + 1

which also produces errors instead of failing, like

?- #X #= #Y + 1, Y = non_integer.
   type_error(integer,non_integer).

?- X in 1..Y.
   instantiation_error.

@Kiyoshi364
Copy link
Author

Yes, they may still produce errors. I like type errors.
But I wish char_code/2 would work without instantiation errors.

?- char_code(X, Y).
   error(instantiation_error,char_code/2).
?- X = a, char_code(X, Y).
   X = a, Y = 97.
?- #X #= #Y + 1.
   clpz:(Y+1#=X).

@bakaq
Copy link
Contributor

bakaq commented Feb 4, 2025

You can use library(freeze) to do some basic "do this when you are able" things to avoid instantiation errors. For example, you can implement this version of char_code/2 that works a bit more like library(clpz):

?- [user].
:- use_module(library(freeze)).
char_code_(Char, Code) :-
    freeze(Char, char_code(Char, Code)),
    freeze(Code, char_code(Char, Code)).

?- char_code_(a, C).
   C = 97.
?- char_code_(C, 97).
   C = a.
?- char_code_(Char, Code).
   freeze:freeze(Char,char_code(Char,Code)), freeze:freeze(Code,char_code(Char,Code)).
?- char_code_(Char, Code), Char = a.
   Char = a, Code = 97.
?- char_code_(Char, Code), Code = 97.
   Char = a, Code = 97.

With library(when) or raw attributed variables (which is what both of these libraries and also library(clpz) use under the hood) you can work around even more complicated situations.

Also, just so you know, we have a relatively active Github Discussions forum that may be more appropriate for questions like this.

@Kiyoshi364
Copy link
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants