Blob and Files #10539
Replies: 5 comments 2 replies
-
Hey @jimmywarting
Surprised there is no web platform test for this. Anyway, should be super simple to fix,
This is on our radar. Currently blobs are always fully stored in isolate memory. Our implementation is technically spec compliant (we pass the web platform tests), but it is not ideal and not what the spec intends. In the future we want to transparently store blobs on disk after a certain size threshold of in memory storage has been reached (similar to how it is done in Chromium: https://chromium.googlesource.com/chromium/src/+/master/storage/browser/blob/README.md). I am planning to tackle this either this, or next quarter. Once we have the ability to asynchronously stream large blobs to/from disk (for example |
Beta Was this translation helpful? Give feedback.
-
I can retract my symbol.toStringTag issue - just had a old deno version and saw that it was there now. |
Beta Was this translation helpful? Give feedback.
-
Couldn't beer the wait 😄 Maybe this will help to speed things up?Took some stuff from my own impl, and switched the byte source from a arrayBuffer to a byte sequences, so it can handle async sources. // Copyright 2018-2021 the Deno authors. All rights reserved. MIT license.
// @ts-check
/// <reference no-default-lib="true" />
/// <reference path="../../core/lib.deno_core.d.ts" />
/// <reference path="../webidl/internal.d.ts" />
/// <reference path="../web/internal.d.ts" />
/// <reference path="../web/lib.deno_web.d.ts" />
/// <reference path="./internal.d.ts" />
/// <reference lib="esnext" />
"use strict";
((window) => {
const core = window.Deno.core;
const webidl = window.__bootstrap.webidl;
// TODO(lucacasonato): this needs to not be hardcoded and instead depend on
// host os.
const isWindows = false;
const POOL_SIZE = 65536;
/**
* @param {string} input
* @param {number} position
* @returns {{result: string, position: number}}
*/
function collectCodepointsNotCRLF(input, position) {
// See https://w3c.github.io/FileAPI/#convert-line-endings-to-native and
// https://infra.spec.whatwg.org/#collect-a-sequence-of-code-points
const start = position;
for (
let c = input.charAt(position);
position < input.length && !(c === "\r" || c === "\n");
c = input.charAt(++position)
);
return { result: input.slice(start, position), position };
}
/**
* @param {string} s
* @returns {string}
*/
function convertLineEndingsToNative(s) {
const nativeLineEnding = isWindows ? "\r\n" : "\n";
let { result, position } = collectCodepointsNotCRLF(s, 0);
while (position < s.length) {
const codePoint = s.charAt(position);
if (codePoint === "\r") {
result += nativeLineEnding;
position++;
if (position < s.length && s.charAt(position) === "\n") {
position++;
}
} else if (codePoint === "\n") {
position++;
result += nativeLineEnding;
}
const { result: token, position: newPosition } = collectCodepointsNotCRLF(
s,
position,
);
position = newPosition;
result += token;
}
return result;
}
/** @param {(Blob | Uint8Array)[]} parts */
async function * toIterator (parts) {
for (const part of parts) {
if (part instanceof Blob) {
yield * part.stream();
} else if (ArrayBuffer.isView(part)) {
let position = part.byteOffset;
const end = part.byteOffset + part.byteLength;
while (position !== end) {
const size = Math.min(end - position, POOL_SIZE);
const chunk = part.buffer.slice(position, position + size);
yield new Uint8Array(chunk);
position += chunk.byteLength;
}
}
}
}
/** @typedef {BufferSource | Blob | string} BlobPart */
/**
* @param {BlobPart[]} parts
* @param {string} endings
* @returns {Uint8Array}
*/
function processBlobParts(parts, endings) {
/** @type {Uint8Array[]} */
const bytesArrays = [];
let size = 0;
for (const element of parts) {
if (element instanceof ArrayBuffer) {
bytesArrays.push(new Uint8Array(element.slice(0)));
size += element.byteLength;
} else if (ArrayBuffer.isView(element)) {
const buffer = element.buffer.slice(
element.byteOffset,
element.byteOffset + element.byteLength,
);
size += element.byteLength;
bytesArrays.push(new Uint8Array(buffer));
} else if (element instanceof Blob) {
bytesArrays.push(element);
size += element.size;
} else if (typeof element === "string") {
const chunk = core.encode(endings == "native" ? convertLineEndingsToNative(element) : element);
size += chunk.byteLength;
bytesArrays.push(chunk);
} else {
throw new TypeError("Unreachable code (invalild element type)");
}
}
return {bytesArrays, size};
}
/**
* @param {string} str
* @returns {string}
*/
function normalizeType(str) {
let normalizedType = str;
if (!/^[\x20-\x7E]*$/.test(str)) {
normalizedType = "";
}
return normalizedType.toLowerCase();
}
const _byteSequence = Symbol("[[ByteSequence]]");
class Blob {
get [Symbol.toStringTag]() {
return "Blob";
}
/** @type {string} */
#type;
/** @type {string} */
#size;
/** @type {Uint8Array|blob} */
#parts;
/**
* @param {BlobPart[]} blobParts
* @param {BlobPropertyBag} options
*/
constructor(blobParts = [], options = {}) {
const prefix = "Failed to construct 'Blob'";
blobParts = webidl.converters["sequence<BlobPart>"](blobParts, {
context: "Argument 1",
prefix,
});
options = webidl.converters["BlobPropertyBag"](options, {
context: "Argument 2",
prefix,
});
this[webidl.brand] = webidl.brand;
const {bytesArrays, size} = processBlobParts(
blobParts,
options.endings,
)
/** @type {Uint8Array|Blob} */
this.#parts = bytesArrays;
this.#size = size;
this.#type = normalizeType(options.type);
}
/** @returns {number} */
get size() {
webidl.assertBranded(this, Blob);
return this.#size;
}
/** @returns {string} */
get type() {
webidl.assertBranded(this, Blob);
return this.#type;
}
/**
* @param {number} [start]
* @param {number} [end]
* @param {string} [contentType]
* @returns {Blob}
*/
slice(start, end, contentType) {
webidl.assertBranded(this, Blob);
const prefix = "Failed to execute 'slice' on 'Blob'";
if (start !== undefined) {
start = webidl.converters["long long"](start, {
clamp: true,
context: "Argument 1",
prefix,
});
}
if (end !== undefined) {
end = webidl.converters["long long"](end, {
clamp: true,
context: "Argument 2",
prefix,
});
}
if (contentType !== undefined) {
contentType = webidl.converters["DOMString"](contentType, {
context: "Argument 3",
prefix,
});
}
const {size} = this;
let relativeStart = start < 0 ? Math.max(size + start, 0) : Math.min(start, size);
let relativeEnd = end < 0 ? Math.max(size + end, 0) : Math.min(end, size);
const span = Math.max(relativeEnd - relativeStart, 0);
const parts = this.#parts;
const blobParts = [];
let added = 0;
for (const part of parts) {
const size = ArrayBuffer.isView(part) ? part.byteLength : part.size;
if (relativeStart && size <= relativeStart) {
// Skip the beginning and change the relative
// start & end position as we skip the unwanted parts
relativeStart -= size;
relativeEnd -= size;
} else {
let chunk
if (ArrayBuffer.isView(part)) {
chunk = part.subarray(relativeStart, Math.min(size, relativeEnd));
added += chunk.byteLength
} else {
chunk = part.slice(relativeStart, Math.min(size, relativeEnd));
added += chunk.size
}
blobParts.push(chunk);
relativeStart = 0; // All next sequential parts should start at 0
// don't add the overflow to new blobParts
if (added >= span) {
break;
}
}
}
/** @type {string} */
let relativeContentType;
if (contentType === undefined) {
relativeContentType = "";
} else {
relativeContentType = normalizeType(contentType);
}
const blob = new Blob([], {type: relativeContentType});
blob.#size = span;
blob.#parts = blobParts;
return blob;
}
/**
* @returns {ReadableStream<Uint8Array>}
*/
stream() {
webidl.assertBranded(this, Blob);
const partIterator = toIterator(this.#parts);
const stream = new ReadableStream({
/** @param {ReadableByteStreamController} controller */
async pull (controller) {
const {value, done} = await partIterator.next();
if (done) controller.close()
controller.enqueue(value);
},
});
return stream;
}
/**
* @returns {Promise<string>}
*/
async text() {
webidl.assertBranded(this, Blob);
const buffer = await this.arrayBuffer();
return core.decode(new Uint8Array(buffer));
}
/**
* @returns {Promise<ArrayBuffer>}
*/
async arrayBuffer() {
webidl.assertBranded(this, Blob);
const stream = this.stream();
const bytes = new Uint8Array(this.size);
let offset = 0;
for await (const chunk of stream) {
bytes.set(chunk, offset);
offset += chunk.length;
}
return bytes.buffer;
}
}
webidl.configurePrototype(Blob);
webidl.converters["Blob"] = webidl.createInterfaceConverter("Blob", Blob);
webidl.converters["BlobPart"] = (V, opts) => {
// Union for ((ArrayBuffer or ArrayBufferView) or Blob or USVString)
if (typeof V == "object") {
if (V instanceof Blob) {
return webidl.converters["Blob"](V, opts);
}
if (V instanceof ArrayBuffer || V instanceof SharedArrayBuffer) {
return webidl.converters["ArrayBuffer"](V, opts);
}
if (ArrayBuffer.isView(V)) {
return webidl.converters["ArrayBufferView"](V, opts);
}
}
return webidl.converters["USVString"](V, opts);
};
webidl.converters["sequence<BlobPart>"] = webidl.createSequenceConverter(
webidl.converters["BlobPart"],
);
webidl.converters["EndingType"] = webidl.createEnumConverter("EndingType", [
"transparent",
"native",
]);
const blobPropertyBagDictionary = [
{
key: "type",
converter: webidl.converters["DOMString"],
defaultValue: "",
},
{
key: "endings",
converter: webidl.converters["EndingType"],
defaultValue: "transparent",
},
];
webidl.converters["BlobPropertyBag"] = webidl.createDictionaryConverter(
"BlobPropertyBag",
blobPropertyBagDictionary,
);
const _Name = Symbol("[[Name]]");
const _LastModfied = Symbol("[[LastModified]]");
class File extends Blob {
get [Symbol.toStringTag]() {
return "File";
}
/** @type {string} */
[_Name];
/** @type {number} */
[_LastModfied];
/**
* @param {BlobPart[]} fileBits
* @param {string} fileName
* @param {FilePropertyBag} options
*/
constructor(fileBits, fileName, options = {}) {
const prefix = "Failed to construct 'File'";
webidl.requiredArguments(arguments.length, 2, { prefix });
fileBits = webidl.converters["sequence<BlobPart>"](fileBits, {
context: "Argument 1",
prefix,
});
fileName = webidl.converters["USVString"](fileName, {
context: "Argument 2",
prefix,
});
options = webidl.converters["FilePropertyBag"](options, {
context: "Argument 3",
prefix,
});
super(fileBits, options);
/** @type {string} */
this[_Name] = fileName;
if (options.lastModified === undefined) {
/** @type {number} */
this[_LastModfied] = new Date().getTime();
} else {
/** @type {number} */
this[_LastModfied] = options.lastModified;
}
}
/** @returns {string} */
get name() {
webidl.assertBranded(this, File);
return this[_Name];
}
/** @returns {number} */
get lastModified() {
webidl.assertBranded(this, File);
return this[_LastModfied];
}
}
webidl.configurePrototype(File);
webidl.converters["FilePropertyBag"] = webidl.createDictionaryConverter(
"FilePropertyBag",
blobPropertyBagDictionary,
[
{
key: "lastModified",
converter: webidl.converters["long long"],
},
],
);
/**
* This is a blob backed up by a file on the disk
* with minium requirement. Its wrapped around a Blob as a blobPart
* so you have no direct access to this.
*
* @author Jimmy Wärting
* @private
*/
class BlobDataItem extends Blob {
#path;
#start;
constructor(options) {
super();
this.#path = options.path;
this.#start = options.start;
this.size = options.size;
this.lastModified = options.lastModified;
}
/**
* Slicing arguments is first validated and formatted
* to not be out of range by Blob.prototype.slice
*/
slice(start, end) {
return new BlobDataItem({
path: this.#path,
lastModified: this.lastModified,
size: end - start,
start
});
}
async * stream() {
const {mtime} = await Deno.stat(this.#path)
if (mtime > this.lastModified) {
throw new DOMException('The requested file could not be read, typically due to permission problems that have occurred after a reference to a file was acquired.', 'NotReadableError');
}
if (this.size) {
const f = await Deno.open(this.#path, { read: true });
await f.seek(this.#start);
yield * Deno.iter(f);
}
}
}
// TODO: Make this function public
/** @returns {Promise<File>} */
async function getFile (path, type = '') {
const stat = await Deno.stat(path);
const blobDataItem = new BlobDataItem({
path,
size: stat.size,
lastModified: stat.mtime.getTime(),
start: 0
});
// TODO: import basename?
const file = new File([blobDataItem], basename(path), {
type, lastModified: blobDataItem.lastModified
});
return file;
}
window.__bootstrap.file = {
Blob,
_byteSequence,
getFile, // TODO: expose somehow? Write doc?
File,
};
})(this);
|
Beta Was this translation helpful? Give feedback.
-
@lucacasonato Hi Luca, did you manage to do anything regarding this? I want to understand if I still need to fully read the file content into memory to use it in Blob. |
Beta Was this translation helpful? Give feedback.
-
👋 Just found this thread as I'm about to write my own Blob adapter for FsFile. LMK if I've just missed an implementation in std somewhere. 😅 |
Beta Was this translation helpful? Give feedback.
-
Hi, I'm coming from the NodeJS world and help maintain node-fetch and fetch-blob
I have also helped and reviewed NodeJS
buffer.Blob
implementations so it follow the spec more correctly.NodeJS are now looking into adding something like our
blobFrom(path)
implementation from our fetch-blob/from.js and create something likefs.blobFrom(path)
I hope to migrate to Deno sometime and I would like to see your blob implementation being improved upon. I also hope to see something like the native file system spec being implemented so it would be possible to retrieve a File from the FileSystem and have something like Blob and files backed up by the FileSystem that actually extends the Blob class. (your blob impl can't really do that atm)
There is a few things about your blob implementation that i have some thoughts on...
The first and very simple easy thing that are missing issymbol.toStringTag
on both Blob and the File ClassWhen browser creates Blob/Files from the filesystem then it just merely stat the file for the size and last modified date and creates a dummy blob (called BlobDataItem in chromium) that don't have any underlying data held in the memory
When you slice this file then you are not really copying/reading the data
The only thing you should actually do is: Create a new Blob with a references to the old Blob and change the size and offset from where it should start reading from and where it should end.
We also treated our blob data as a single buffer but it all changed when we instead operated on a blob parts sequences and had to rewrite the hole slice method b/c of parts that now came from the file system and not only from the memory.
The benefit of this lead to a memory reduction and slicing a blob from memory would not clone the data - it only created a new references point from where it should start/stop reading another blob from. This is more or less how the browser dose it behind the scenes
Beta Was this translation helpful? Give feedback.
All reactions