-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to return raw file offsets from within the tar? #162
Comments
Here is what I puzzled together. Does this seem right? Is there a more efficient way to do this: const fs = require('fs')
const tar = require('tar-stream')
const gunzip = require('gunzip-maybe');
function tar_index(path){
const input = fs.createReadStream(path);
const extract = tar.extract();
let output = [];
return new Promise(function(resolve, reject) {
function process_entry(header, stream, next) {
var offset = extract._buffer.shifted
//console.log(stream)
output.push({
name: header.name,
offset: offset,
size: header.size
});
stream.on('end', function () {
next() //read for next file
})
stream.on('error', reject);
stream.resume();
}
function finish_stream(){
resolve(output);
}
var extract = tar.extract({allowUnknownFormat: true})
.on('entry', process_entry).on('finish', finish_stream).on('error', reject)
input.pipe(gunzip()).pipe(extract);
}).finally(function(){
input.destroy();
});
}
tar_index('myfile.tar.gz').then(console.log) |
Should be easy to add yea. Feel free to PR that |
@mafintosh sorry to pick you again, trying to understand this code. Am I correct that |
I would track it independently, seems much simpler. ie a property that tracks byteOffset that is updated everytime we eat from the buffer. then add that to the header object we emit |
I'll give that another try. FWIW, the goal is to mmap the tar file in WASM using the emscripten packaging format. So we need the server to generate an index for the tar that looks like so: https://jeroen.r-universe.dev/bin/emscripten/contrib/4.4/jsonlite_1.8.9.js.metadata |
Do you suggest we only have to add a line to Lines 44 to 62 in 126968f
|
I would like to generate an index of a tar file with the start and end offset of each file in the tarball, such that I can mmap or extract a single file later on. Is this possible with tar-stream?
The documentation of headers only mentions the size of each file, but I would also need the offset within the tar.
From hacking it looks like the global property
extract._buffer.shifted
contains what I need but this is mostly a guess. It would be nice if theheader
callback could include the offset property for each entry.The text was updated successfully, but these errors were encountered: