Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding an unpack_docs/2 function #63

Closed
wants to merge 1 commit into from

Conversation

supersimple
Copy link
Member

This is 1 phase of issue #61

@wojtekmach can you review this thoroughly and give me feedback? I am not very familiar with Erlang, but I am trying to get better.

Copy link
Member

@wojtekmach wojtekmach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@supersimple thanks for picking this up!

One way to iterate on this feature is to test this end-to-end like this:

$ curl -O https://repo.hex.pm/docs/hex_core-0.5.0.tar.gz
$ rebar3 shell
> {ok, Files} = file:read_file('hex_core-0.5.0.tar.gz'), hex_tarball:unpack_docs(Tarball, memory).

you can try writing tests too, e.g. we create a docs tarball using hex_tarball:create_docs and try to unpack it with the new function.

%% metadata => #{<<"name">> => <<"foo">>, ...}}}
%% '''
-spec unpack_docs(tarball(), memory) ->
{ok, #{checksum => checksum(), metadata => metadata(), contents => contents()}} |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs tarball has neither checksum nor metadata. I think the spec should be:

-spec unpack_docs(tarball(), memory) -> {ok, contents()} | {error, term()};
                 (tarball(), filename()) -> ok | {error, term()}.

Thought I'm not sure about using type tarball() here. What I mean is, does it make sense to differentiate between package tarball and docs tarball in types? The former is a tar of checksum, metadata.config, contents.tar.gz; the latter is an compressed tar of actual docs files. So maybe we should have a separate docs_tarball() type?

I'm ok with re-using the same type, they're both binaries under the hood but curious what others think. cc @ericmj @ferd @tsloughter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structurally I bet the types will be identical (they're both tarballs), so what you have here is pretty much as good as it gets for analysis. The only distinction you could make is add aliases to signify intent (i.e. -type docs_tarball() :: tarball() which would at the very least let you make a distinction similar to meters and feet even if they'd all be numbers.

I do prefer stricter types in such a case because the whole thing is closer to self-documenting, but it's got no analysis benefits (I don't think Dialyzer is fancy enough to track it)

{error, {tarball, too_big}};

unpack_docs(Tarball, Output) ->
case hex_erl_tar:extract({binary, Tarball}, [memory]) of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to pass compressed option here

{error, {tarball, empty}};

{ok, FileList} ->
do_unpack(maps:from_list(FileList), Output);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do_unpack performs validations like there should be a CHECKSUM file etc, they're not needed for docs tarballs.

@ericmj
Copy link
Member

ericmj commented Nov 12, 2019

Reopened here: #74

@ericmj ericmj closed this Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants