-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding an unpack_docs/2 function #63
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@supersimple thanks for picking this up!
One way to iterate on this feature is to test this end-to-end like this:
$ curl -O https://repo.hex.pm/docs/hex_core-0.5.0.tar.gz
$ rebar3 shell
> {ok, Files} = file:read_file('hex_core-0.5.0.tar.gz'), hex_tarball:unpack_docs(Tarball, memory).
you can try writing tests too, e.g. we create a docs tarball using hex_tarball:create_docs
and try to unpack it with the new function.
%% metadata => #{<<"name">> => <<"foo">>, ...}}} | ||
%% ''' | ||
-spec unpack_docs(tarball(), memory) -> | ||
{ok, #{checksum => checksum(), metadata => metadata(), contents => contents()}} | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs tarball has neither checksum nor metadata. I think the spec should be:
-spec unpack_docs(tarball(), memory) -> {ok, contents()} | {error, term()};
(tarball(), filename()) -> ok | {error, term()}.
Thought I'm not sure about using type tarball()
here. What I mean is, does it make sense to differentiate between package tarball and docs tarball in types? The former is a tar of checksum, metadata.config, contents.tar.gz; the latter is an compressed tar of actual docs files. So maybe we should have a separate docs_tarball()
type?
I'm ok with re-using the same type, they're both binaries under the hood but curious what others think. cc @ericmj @ferd @tsloughter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
structurally I bet the types will be identical (they're both tarballs), so what you have here is pretty much as good as it gets for analysis. The only distinction you could make is add aliases to signify intent (i.e. -type docs_tarball() :: tarball()
which would at the very least let you make a distinction similar to meters and feet even if they'd all be numbers.
I do prefer stricter types in such a case because the whole thing is closer to self-documenting, but it's got no analysis benefits (I don't think Dialyzer is fancy enough to track it)
{error, {tarball, too_big}}; | ||
|
||
unpack_docs(Tarball, Output) -> | ||
case hex_erl_tar:extract({binary, Tarball}, [memory]) of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to pass compressed
option here
{error, {tarball, empty}}; | ||
|
||
{ok, FileList} -> | ||
do_unpack(maps:from_list(FileList), Output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do_unpack performs validations like there should be a CHECKSUM file etc, they're not needed for docs tarballs.
Reopened here: #74 |
This is 1 phase of issue #61
@wojtekmach can you review this thoroughly and give me feedback? I am not very familiar with Erlang, but I am trying to get better.