JavaScript specifies a lot of built-in functionality which every V8 context must provide. For example, you can run Math.PI and that will work in a JavaScript console/repl.
The global object and all the built-in functionality must be setup and initialized into the V8 heap. This can be time consuming and affect runtime performance if this has to be done every time.
Now this is where the file snapshot_blob.bin
comes into play.
But what is this bin file?
This blob is a prepared snapshot that gets directly deserialized into the heap
to provide an initilized context.
When V8 is built with v8_use_external_startup_data
the build process will
create a snapshot_blob.bin file using a template in BUILD.gn named run_mksnapshot
, but if false it will generate a file named snapshot.cc.
This executable is used to create snapshot and does so starting a V8 Isolate and then serializing runtime data into binary format.
There is an executable named mksnapshot
which is defined in
src/snapshot/mksnapshot.cc
. This can be executed on the command line:
$ mksnapshot --help
Usage: ./out/x64.release_gcc/mksnapshot [--startup-src=file] [--startup-blob=file] [--embedded-src=file] [--embedded-variant=label] [--target-arch=arch] [--target-os=os] [extras]
But the strange things is that the usage is never printed, instead the normal
V8/D8 options. If we look in src/snapshot/snapshot.cc
we can see:
// Print the usage if an error occurs when parsing the command line
// flags or if the help flag is set.
int result = i::FlagList::SetFlagsFromCommandLine(&argc, argv, true, false);
if (result > 0 || (argc > 3) || i::FLAG_help) {
::printf("Usage: %s --startup_src=... --startup_blob=... [extras]\n",
argv[0]);
i::FlagList::PrintHelp();
return !i::FLAG_help;
}
SetFlagsFromCommandLine
is defined in src/flags/flags.cc
:
int FlagList::SetFlagsFromCommandLine(int* argc, char** argv, bool remove_flags) {
...
if (FLAG_help) {
PrintHelp();
exit(0);
}
}
So even if we specify `--help`` the usage string for snapshot will not be printed. As a suggetion I've created https://chromium-review.googlesource.com/c/v8/v8/+/2290852 to address this.
With this change --help
will print the usage message:
$ ./out/x64.release_gcc/mksnapshot --help | head -n 1
Usage: ./out/x64.release_gcc/mksnapshot --startup_src=... --startup_blob=... [extras]
What mksnapshot does is much the same start up of a normal V8 application, it
has to initialize V8 itself (src/snapshot/snapshot.cc
):
v8::V8::InitializeICUDefaultLocation(argv[0]);
std::unique_ptr<v8::Platform> platform = v8::platform::NewDefaultPlatform();
v8::V8::InitializePlatform(platform.get());
v8::V8::Initialize();
Notice that this is just was we have in hello-world or in v8_test_fixture.h.
Next, we have:
SnapshotFileWriter snapshot_writer;
snapshot_writer.SetSnapshotFile(i::FLAG_startup_src);
snapshot_writer.SetStartupBlobFile(i::FLAG_startup_blob);
i::FLAG_startup_src
is defined in src/flags/flag-definitions.h
:
DEFINE_STRING(startup_src, nullptr,
"Write V8 startup as C++ src. (mksnapshot only)")
#define DEFINE_STRING(nam, def, cmt) FLAG(STRING, const char*, nam, def, cmt)
#undef FLAG
#ifdef ENABLE_GDB_JIT_INTERFACE
#define FLAG FLAG_FULL
#else
#define FLAG FLAG_READONLY
#endif
#define FLAG_FULL(ftype, ctype, nam, def, cmt) \
V8_EXPORT_PRIVATE extern ctype FLAG_##nam;
#define FLAG_READONLY(ftype, ctype, nam, def, cmt) \
static constexpr ctype FLAG_##nam = def;
So the above macro would be expanded by the preprocessor into:
V8_EXPORT_PRIVATE extern const char* FLAG_startup_src;
The mksnapshot executable takes a number of arguments one of which is the target
OS which can be different from the host OS. Examples of targets can be found
in BUILD.gn
and are android
, fuchsia
, ios
, linux
, mac
, and win
.
There is a template named run_mksnapshot
in BUILD.gn
that shows how
mksnapshot
is called.
If v8_use_external_startup_data
is set in args.gn, the option --startup_blob
will be passed to mksnapshot:
if (v8_use_external_startup_data) {
outputs += [ "$root_out_dir/snapshot_blob${suffix}.bin" ]
data += [ "$root_out_dir/snapshot_blob${suffix}.bin" ]
args += [
"--startup_blob",
rebase_path("$root_out_dir/snapshot_blob${suffix}.bin", root_build_dir),
]
And this binary file can then be deserialized. If v8_use_external_startup_data
is not set --startup_src
will be passed instead:
} else {
outputs += [ "$target_gen_dir/snapshot${suffix}.cc" ]
args += [
"--startup_src",
rebase_path("$target_gen_dir/snapshot${suffix}.cc", root_build_dir),
]
}
To see what is actually getting used when building one can look at the file
out/x64.release_gcc/toolchain.ninja
:
rule ___run_mksnapshot_default___build_toolchain_linux_x64__rule
command = python ../../tools/run.py ./mksnapshot --turbo_instruction_scheduling
--target_os=linux
--target_arch=x64
--embedded_src gen/embedded.S
--embedded_variant Default
--random-seed 314159265
--startup_src gen/snapshot.cc
--native-code-counters
--verify-heap
description = ACTION //:run_mksnapshot_default(//build/toolchain/linux:x64)
restat = 1
pool = build_toolchain_action_pool
This option specifies a path to a C++ source file that will be generated by mksnapshot, for example:
$ ../v8_src/v8/out/x64.release_gcc/mksnapshot --startup_src=gen/example_snapshot.cc
After this we can inspect gen/example_snapshot.cc
which will contain the
serialized snapshot in a byte array named blob_data
.
#include "src/snapshot/snapshot.h"
namespace v8 {
namespace internal {
alignas(kPointerAlignment) static const byte blob_data[] = {
...
};
static const int blob_size = 133115;
static const v8::StartupData blob = { (const char*) blob_data, blob_size };
const v8::StartupData* Snapshot::DefaultSnapshotBlob() {
return &blob;
}
This file is generated by MaybeWriteSnapshotFile
.
This is also a command line option that can be passed to mksnapshot
which
will generate a binary file in the path specified. Note that this can be
specified in addition to startup-src
which I was not aware of:
$ ../v8_src/v8/out/x64.release_gcc/mksnapshot --startup_src=gen/example-snapshot.cc --startup-blob=gen/example-blob.bin
$ ls -l -h gen/example-blob.bin
-rw-rw-r--. 1 danielbevenius danielbevenius 130K Jul 14 09:52 gen/example-blob.bin
This binary can then be specified using:
V8::InitializeExternalStartupDataFromFile("../gen/example-blob.bin");
src/init/startup-data-util.cc
contains the implementation for this and notice
that it is guarded by a macro:
void InitializeExternalStartupDataFromFile(const char* snapshot_blob) {
#ifdef V8_USE_EXTERNAL_STARTUP_DATA
LoadFromFile(snapshot_blob);
#endif // V8_USE_EXTERNAL_STARTUP_DATA
}
So we need to make sure that we have set v8_use_external_startup_data
to true
in args.gn
and rebuild v8.
There is an example, snapshot_test that explores using the binary blob.
This is also a command line option that can be passed to mksnapshot
.
$ ../v8_src/v8/out/x64.release_gcc/mksnapshot --startup_src=gen/example-snapshot.cc --startup-blob=gen/example-blob.bin --embedded-src=gen/example-embedded.S
src/snapshot/embedded/embedded-file-writer.h
declares the class responsible
for generating this file.
How is this file used? TODO: explain how this is used.
Not to be confused with embedded-src
this is any extra javascript that can
be passed as the extra
option argument on the command line to mksnapshot.
$ ../v8_src/v8/out/x64.release_gcc/mksnapshot --startup_src=gen/example-snapshot.cc --startup-blob=gen/example-blob.bin --embedded-src=gen/example-embedded.S lib/embed-script.js
Loading script for embedding: lib/embed-script.js
To understand what is actually happening
snapshot_test has a test named CreateSnapshot
which creates a creates a V8 environment (initializes the Platform, creates
an Isolate, and a Context), adds a function named test_snapshot
to the
Context and then creates a snapshot using SnapshotCreator.
SnapshotCreator creates the binary in a class named StartupData
which can be
found in v8.h
:
class V8_EXPORT StartupData {
bool CanBeRehashed() const;
bool IsValid() const;
const char* data;
int raw_size;
};
After creating the StartupData (which is called a blob (Binary Large Object) the V8 environment is disposed of. Next, a new V8 environment is created but this time the Context is created using the StartupData:
Isolate::CreateParams create_params;
// Specify the startup_data blob created earlier.
create_params.snapshot_blob = &startup_data;
isolate = Isolate::New(create_params);
Local<Context> context = Context::FromSnapshot(isolate, index).ToLocalChecked();
After this we are going to call the function that we serialized previously and verify that that works.
So that was a JavasCript function that was serialized into a snapshot blob and then deserialized.
Note, that the tests in snapshot_test will contain a bit of duplicated code which is intentional as I think it makes it easer to see what is required to perform these tasks plus this is only for learning.
This function can be used to deserialize a blob into a Context. This was used in the CreateSnapshot test:
Local<Context> context = Context::FromSnapshot(isolate, index).ToLocalChecked();
We only passed in the isolate and the index above, but FromSnapshot
also takes
other arguments that have default values when not specified:
static MaybeLocal<Context> FromSnapshot(
Isolate* isolate, size_t context_snapshot_index,
DeserializeInternalFieldsCallback embedder_fields_deserializer =
DeserializeInternalFieldsCallback(),
ExtensionConfiguration* extensions = nullptr,
MaybeLocal<Value> global_object = MaybeLocal<Value>(),
MicrotaskQueue* microtask_queue = nullptr);
In our case Context::FromSnapshot will call:
return NewContext(external_isolate, extensions, MaybeLocal<ObjectTemplate>(),
global_object, index_including_default_context,
embedder_fields_deserializer, microtask_queue)
This will trickle down into bootstrapper.cc which has the following call:
if (isolate->initialized_from_snapshot()) {
Handle<Context> context;
if (Snapshot::NewContextFromSnapshot(isolate, global_proxy,
context_snapshot_index,
embedder_fields_deserializer)
.ToHandle(&context)) {
native_context_ = Handle<NativeContext>::cast(context);
}
}
This call will land in snapshot.cc:
MaybeHandle<Context> Snapshot::NewContextFromSnapshot(
Isolate* isolate, Handle<JSGlobalProxy> global_proxy, size_t context_index,
v8::DeserializeEmbedderFieldsCallback embedder_fields_deserializer) {
...
const v8::StartupData* blob = isolate->snapshot_blob();
bool can_rehash = ExtractRehashability(blob);
Vector<const byte> context_data = SnapshotImpl::ExtractContextData(
blob, static_cast<uint32_t>(context_index));
SnapshotData snapshot_data(MaybeDecompress(context_data));
MaybeHandle<Context> maybe_result = ContextDeserializer::DeserializeContext(
isolate, &snapshot_data, can_rehash, global_proxy,
embedder_fields_deserializer);
Handle<Context> result;
if (!maybe_result.ToHandle(&result)) return MaybeHandle<Context>();
return result;
}
In this case the most interesting part is the ContextDeserializer::DeserializeContext call (src/snapshot/context-deserializer.cc):
MaybeHandle<Context> ContextDeserializer::DeserializeContext(
Isolate* isolate, const SnapshotData* data, bool can_rehash,
Handle<JSGlobalProxy> global_proxy,
v8::DeserializeEmbedderFieldsCallback embedder_fields_deserializer) {
ContextDeserializer d(isolate, data, can_rehash);
MaybeHandle<Object> maybe_result =
d.Deserialize(isolate, global_proxy, embedder_fields_deserializer);
Handle<Object> result;
return maybe_result.ToHandle(&result) ? Handle<Context>::cast(result)
: MaybeHandle<Context>();
}
d.Deserialize
will call ContextDeserializer::Deserialize
.
When we have functions that are not defined in V8 itself these functions will have addresses that V8 does not know about. The function will be a symbol that needs to be resolved when V8 deserializes a blob.
These symbols are serialized and when V8 deserializes a snapshot it will try
to match symbols to addresses. This is done using external_references
which
is a null terminated vector which is populated with the addresses to functions
and then passed into SnapshotCreator:
std::vector<intptr_t> external_refs;
external_refs.push_back(reinterpret_cast<intptr_t>(ExternalRefFunction));
...
isolate = Isolate::Allocate();
SnapshotCreator snapshot_creator(isolate, external_refs.data());
After creating the blob and perhaps storing it in a file, it can later be used to create a new Context:
Isolate::CreateParams create_params;
create_params.snapshot_blob = &startup_data;
create_params.external_references = external_refs.data();
Isolate* isolate = Isolate::New(create_params);
...
Local<Context> context = Context::FromSnapshot(isolate, index).ToLocalChecked();
There is a test case that explores this feature and it can be run using:
$ ./test/snapshot_test --gtest_filter=SnapshotTest.ExternalReference
One thing to keep in mind is that normally the creation of a snapshot would be done in a separate process and stored on an external device like the file system. In the tests everything is in the same process so the functions address will remains the same and not change and one might ask why does this have to be done at all, but the reason is that the function address would change.
So how does the SnapshotCreator use this list of addresses?
Well this must be done during serialization, and to find out where we can comment
out the the adding of the external reference in our test. Doing that we'll find
that the following error is displayed:
Unknown external reference 0x419822.
So we can set a break point where this is generated:
(lldb) br s -f external-reference-encoder.cc -l 74
When an ExternalReferenceEncoder is created it is passed an isolate (src/codegen/external-reference-encoder.cc).
(lldb) br s -f external-reference-encoder.cc -l 34
ExternalReferenceEncoder::ExternalReferenceEncoder(Isolate* isolate) {
...
map_ = isolate->external_reference_map();
if (map_ != nullptr) return;
map_ = new AddressToIndexHashMap();
isolate->set_external_reference_map(map_);
// Add V8's external references.
ExternalReferenceTable* table = isolate->external_reference_table();
for (uint32_t i = 0; i < ExternalReferenceTable::kSize; ++i) {
Address addr = table->address(i);
// Ignore duplicate references.
// This can happen due to ICF. See http://crbug.com/726896.
if (map_->Get(addr).IsNothing()) map_->Set(addr, Value::Encode(i, false));
DCHECK(map_->Get(addr).IsJust());
}
// Add external references provided by the embedder.
const intptr_t* api_references = isolate->api_external_references();
if (api_references == nullptr) return;
for (uint32_t i = 0; api_references[i] != 0; ++i) {
Address addr = static_cast<Address>(api_references[i]);
// Ignore duplicate references.
// This can happen due to ICF. See http://crbug.com/726896.
if (map_->Get(addr).IsNothing()) map_->Set(addr, Value::Encode(i, true));
DCHECK(map_->Get(addr).IsJust());
}
}
Notice that the map_
is retrieved from the Isolate which is of type
AddressToIndexHashMap:
class AddressToIndexHashMap : public PointerToIndexHashMap<Address> {};
Now, in our case map_
is nullptr so a new map is created and also set
on the isolate.
Then, V8's external references are added to the newly created map and after that we have our api_references that we set:
(lldb) expr addr
(v8::internal::Address) $0 = 4298802
Which matches the address of ExternalRefFunction we print in the test:
address of ExternalRefFunction function: 4298802
(lldb) expr map_->Get(addr)
(v8::Maybe<unsigned int>) $9 = (has_value_ = true, value_ = 2147483648)
So that is how at a later stage it is possible to reference:
(lldb) expr isolate->external_reference_map()->Get(4298802)
(v8::Maybe<unsigned int>) $11 = (has_value_ = true, value_ = 2147483648)
TODO: How is the actually used?
So after we have created the StartupData blob and then use it when creating a new Isolate, the blob will be deserialised. In Deserializer::ReadSingleBytecodeData we then have:
case kApiReference: {
uint32_t reference_id = static_cast<uint32_t>(source_.GetInt());
Address address;
if (isolate()->api_external_references()) {
address = static_cast<Address>(
isolate()->api_external_references()[reference_id]);
} else {
address = reinterpret_cast<Address>(NoExternalReferencesCallback);
}
if (V8_HEAP_SANDBOX_BOOL && data == kSandboxedApiReference) {
return WriteExternalPointer(slot_accessor.slot(), address,
kForeignForeignAddressTag);
} else {
DCHECK(!V8_HEAP_SANDBOX_BOOL);
return WriteAddress(slot_accessor.slot(), address);
}
}
(lldb) expr reference_id
(uint32_t) $15 = 0
(lldb) expr isolate()->api_external_references()[0]
(intptr_t) $18 = 4298802
So in the blob the current source_ position is of type kApiReference
type and
the value is an index into the api_external_references array. This address
is then written to the using WriteAddress which does a memory copy:
template <typename TSlot>
int Deserializer::WriteAddress(TSlot dest, Address value) {
base::Memcpy(dest.ToVoidPtr(), &value, kSystemPointerSize);
return (kSystemPointerSize / TSlot::kSlotDataSize);
}
So 4298802 (the value of the passed in Address which is also the pointer to our external function) will be written to the destination slot in memory.
Just to recap this a little and remember that I'm using a single test here but in a real world situation the StartupData would have been written to some external storage like the file system. Another program would then use the blob and the address of the external function would most likely be a different address, so this is where V8 need to link this with the correct address of the function. In the ExternalReference test we now have two different functions being used. One is used in the snapshot creation and the other is used when creating a new Isolate using the blob. We can see that the second function will be called in this case.
In the constructor of SnapshotCreator we have the following code:
SnapshotCreator::SnapshotCreator(Isolate* isolate,
const intptr_t* external_references,
StartupData* existing_snapshot) {
SnapshotCreatorData* data = new SnapshotCreatorData(isolate)
}
SnapshotCreatorData
is an internal struct that has the following members:
explicit SnapshotCreatorData(Isolate* isolate)
: isolate_(isolate),
default_context_(),
contexts_(isolate),
created_(false) {}
ArrayBufferAllocator allocator_;
Isolate* isolate_;
Persistent<Context> default_context_;
SerializeInternalFieldsCallback default_embedder_fields_serializer_;
PersistentValueVector<Context> contexts_;
std::vector<SerializeInternalFieldsCallback> embedder_fields_serializers_;
bool created_;
Next, in the constructor we have:
i::Isolate* internal_isolate = reinterpret_cast<i::Isolate*>(isolate);
internal_isolate->set_array_buffer_allocator(&data->allocator_);
internal_isolate->set_api_external_references(external_references);
internal_isolate->enable_serializer();
isolate->Enter();
The casting to the internal isolate is something we have seen before. After that
the arraybuffer allocator will be set followed by the external references.
set_api_external_references
is generated by the preprocessor macro
ISOLATE_INIT_LIST
which can be found in src/execution/isolate.h:
#define GLOBAL_ACCESSOR(type, name, initialvalue) \
inline type name() const { \
DCHECK(OFFSET_OF(Isolate, name##_) == name##_debug_offset_); \
return name##_; \
} \
inline void set_##name(type value) { \
DCHECK(OFFSET_OF(Isolate, name##_) == name##_debug_offset_); \
name##_ = value; \
}
ISOLATE_INIT_LIST(GLOBAL_ACCESSOR)
#undef GLOBAL_ACCESSOR
#define ISOLATE_INIT_LIST(V) \
...
V(const intptr_t*, api_external_references, nullptr) \
And this is simple generating a getter named api_external_references
and the
setter will be named set_api_external_references
:
(lldb) expr internal_isolate->api_external_references()
(const intptr_t *) $1 = 0x0000000000495cf0
Next, the serializer is enabled and the Isolate Entered followed by the creation of a StartupData pointer:
const StartupData* blob = existing_snapshot
? existing_snapshot
: i::Snapshot::DefaultSnapshotBlob();
In our case there is no existing_snapshot so the default snapshot blob will be
used. Now, the DefaultSnapshotBlob was created by V8 using mksnapshot
which
like node_mkshapshot also produces a C++ file, this one is names snapshot.cc,
and we are setting the StartupData pointer to point to it.
After that the StartupData pointer is set on the internal_isolate and
Snapshot::Initialize is called:
internal_isolate->set_snapshot_blob(blob);
i::Snapshot::Initialize(internal_isolate);
Snapshot::Initialize can be found in src/snapshot/snapshot.cc:
const v8::StartupData* blob = isolate->snapshot_blob();
SnapshotImpl::CheckVersion(blob);
CHECK(VerifyChecksum(blob));
Vector<const byte> startup_data = SnapshotImpl::ExtractStartupData(blob);
Vector<const byte> read_only_data = SnapshotImpl::ExtractReadOnlyData(blob);
SnapshotData startup_snapshot_data(MaybeDecompress(startup_data));
SnapshotData read_only_snapshot_data(MaybeDecompress(read_only_data));
bool success = isolate->InitWithSnapshot(&startup_snapshot_data,
&read_only_snapshot_data,
ExtractRehashability(blob));
InitWithSnapshot
will call Init
#define ASSIGN_ELEMENT(CamelName, hacker_name) \
isolate_addresses_[IsolateAddressId::k##CamelName##Address] = \
reinterpret_cast<Address>(hacker_name##_address());
FOR_EACH_ISOLATE_ADDRESS_NAME(ASSIGN_ELEMENT)
#undef ASSIGN_ELEMENT
This can be expanded using the preprocessor:
$ g++ -Iout/learning_v8/gen -Iinclude -I. -E src/execution/isolate.cc
isolate_addresses_[IsolateAddressId::kHandlerAddress] = reinterpret_cast<Address>(handler_address());
isolate_addresses_[IsolateAddressId::kCEntryFPAddress] = reinterpret_cast<Address>(c_entry_fp_address());
isolate_addresses_[IsolateAddressId::kCFunctionAddress] = reinterpret_cast<Address>(c_function_address());
isolate_addresses_[IsolateAddressId::kContextAddress] = reinterpret_cast<Address>(context_address());
isolate_addresses_[IsolateAddressId::kPendingExceptionAddress] =
reinterpret_cast<Address>(pending_exception_address());
isolate_addresses_[IsolateAddressId::kPendingHandlerContextAddress] =
reinterpret_cast<Address>(pending_handler_context_address());
isolate_addresses_[IsolateAddressId::kPendingHandlerEntrypointAddress] =
reinterpret_cast<Address>(pending_handler_entrypoint_address());
isolate_addresses_[IsolateAddressId::kPendingHandlerConstantPoolAddress] =
reinterpret_cast<Address>(pending_handler_constant_pool_address());
isolate_addresses_[IsolateAddressId::kPendingHandlerFPAddress] =
reinterpret_cast<Address>(pending_handler_fp_address());
isolate_addresses_[IsolateAddressId::kPendingHandlerSPAddress] =
reinterpret_cast<Address>(pending_handler_sp_address());
isolate_addresses_[IsolateAddressId::kExternalCaughtExceptionAddress] =
reinterpret_cast<Address>(external_caught_exception_address());
isolate_addresses_[IsolateAddressId::kJSEntrySPAddress] =
reinterpret_cast<Address>(js_entry_sp_address());
Next CodeRanges will be initialized.
InitializeCodeRanges();
I'm going to skip along here a little as I'm most interested in the external references and their usage in this function:
isolate_data_.external_reference_table()->Init(this);
So isolate_data has a pointer to an ExternalReferenceTable and init
will land
in src/codegen/external-reference-table.cc:
void ExternalReferenceTable::Init(Isolate* isolate) {
int index = 0;
// kNullAddress is preserved through serialization/deserialization.
Add(kNullAddress, &index);
AddReferences(isolate, &index);
AddBuiltins(&index);
AddRuntimeFunctions(&index);
AddIsolateAddresses(isolate, &index);
AddAccessors(&index);
AddStubCache(isolate, &index);
AddNativeCodeStatsCounters(isolate, &index);
is_initialized_ = static_cast<uint32_t>(true);
CHECK_EQ(kSize, index);
}
Lets take a look at these Add functions and see what they are doing.
First, ref_addr_
is an array of v8::internal::Address's:
void ExternalReferenceTable::Add(Address address, int* index) {
ref_addr_[(*index)++] = address;
}
(lldb) expr ref_addr_
(v8::internal::Address [986]) $14 = {
[0] = 16045690975706529519
[1] = 16045690975706529519
And the first entry in this array will be set to the kNullAddress, after the first call to Add:
(v8::internal::Address [986]) $18 = {
[0] = 0
[1] = 16045690975706529519
Next, we have:
AddReferences(isolate, &index);
AddReferences has a few macros which can be expanded using:
$ g++ -Iout/learning_v8/gen -Iinclude -I. -E src/codegen/external-reference-table.cc
Lets take a concrete example:
Add(ExternalReference::abort_with_reason().address(), index);
(lldb) expr ExternalReference::abort_with_reason
(v8::internal::ExternalReference (*)()) $19 = 0x00007ffff610069e (libv8.so`v8::internal::ExternalReference::abort_with_reason() at external-reference.cc:420:1)
And if we look in external-reference.cc we find:
FUNCTION_REFERENCE(abort_with_reason, i::abort_with_reason)
ExternalReference ExternalReference::abort_with_reason() {
static_assert(IsValidExternalReferenceType<decltype(&i::abort_with_reason)>::value, "IsValidExternalReferenceType<decltype(&i::abort_with_reason)>::value");
return ExternalReference(Redirect((reinterpret_cast<v8::internal::Address>(i::abort_with_reason))));
}
Now, abort_with_reason
can be found in src/codegen/external-reference.cc:
void abort_with_reason(int reason) {
if (IsValidAbortReason(reason)) {
const char* message = GetAbortReason(static_cast<AbortReason>(reason));
base::OS::PrintError("abort: %s\n", message);
} else {
base::OS::PrintError("abort: <unknown reason: %d>\n", reason);
}
base::OS::Abort();
V8_Fatal("unreachable code");
}
So what is happening here is that this function is being added to the array of Address's at location 1. Remember that V8 also supports snapshots and has the command mksnapshot which creates a snapshot. It too has functions addresses that need to be resolved when the snapshot is loaded.
But we have not seen any of the external references that we passed in, these are still V8's references as far as I can tell. And also notice that these are added to the Isolate when it is created. When we later create a context is when we pass in the StartupData:
Local<Context> context = Context::FromSnapshot(isolate, index).ToLocalChecked();
After we have returned from this function we will be back in Isolate::Init and after stepping through a little we come to:
if (create_heap_objects) {
heap_.read_only_space()->ClearStringPaddingIfNeeded();
read_only_heap_->OnCreateHeapObjectsComplete(this);
} else {
StartupDeserializer startup_deserializer(this, startup_snapshot_data, can_rehash);
startup_deserializer.DeserializeIntoIsolate();
}
So we are creating a new StartupDeserializer and passing in the current Isolate, the startup_snapshot_data.
Now in Deserializer::Deserializer we find the following:
#ifdef DEBUG
num_api_references_ = 0;
// The read-only deserializer is run by read-only heap set-up before the
// heap is fully set up. External reference table relies on a few parts of
// this set-up (like old-space), so it may be uninitialized at this point.
if (isolate->isolate_data()->external_reference_table()->is_initialized()) {
// Count the number of external references registered through the API.
if (isolate->api_external_references() != nullptr) {
while (isolate->api_external_references()[num_api_references_] != 0) {
num_api_references_++;
}
}
}
This is only for debug but these are the references that were set earlier in IsolateData:
internal_isolate->set_api_external_references(external_references);
After this we the StartupSerializer constructor is finished and the next line ( now aback in isolate.cc and Isolate::Init) we have:
startup_deserializer.DeserializeIntoIsolate();
In src/snapshot/startup-deserializer.cc we find the following:
void StartupDeserializer::DeserializeIntoIsolate() {
{
...
{
isolate()->heap()->IterateSmiRoots(this);
isolate()->heap()->IterateRoots(
this,
base::EnumSet<SkipRoot>{SkipRoot::kUnserializable, SkipRoot::kWeak});
Iterate(isolate(), this);
DeserializeStringTable();
isolate()->heap()->IterateWeakRoots(
this, base::EnumSet<SkipRoot>{SkipRoot::kUnserializable});
DeserializeDeferredObjects();
for (Handle<AccessorInfo> info : accessor_infos()) {
RestoreExternalReferenceRedirector(isolate(), info);
}
for (Handle<CallHandlerInfo> info : call_handler_infos()) {
RestoreExternalReferenceRedirector(isolate(), info);
}
}
}
StartUpDeserializer extends Deserializer which extends SerializerDeserializer which in turn extends RootVisitor. And in Deserializer::VisitRootPointers we can find:
void Deserializer::ReadData(FullMaybeObjectSlot start,
FullMaybeObjectSlot end) {
FullMaybeObjectSlot current = start;
while (current < end) {
byte data = source_.Get();
current += ReadSingleBytecodeData(data, SlotAccessorForRootSlots(current));
}
CHECK_EQ(current, end);
}
If we look at ReadSingleBytecodeData we can find rather large switch statement that takes the data passed in:
switch (data) {
....
case kApiReference: {
uint32_t reference_id = static_cast<uint32_t>(source_.GetInt());
Address address;
if (isolate()->api_external_references()) {
DCHECK_WITH_MSG(reference_id < num_api_references_,
"too few external references provided through the API");
address = static_cast<Address>(
isolate()->api_external_references()[reference_id]);
} else {
address = reinterpret_cast<Address>(NoExternalReferencesCallback);
}
if (V8_HEAP_SANDBOX_BOOL && data == kSandboxedApiReference) {
return WriteExternalPointer(slot_accessor.slot(), address,
kForeignForeignAddressTag);
} else {
DCHECK(!V8_HEAP_SANDBOX_BOOL);
return WriteAddress(slot_accessor.slot(), address);
}
}
This is where the references that we passed come into play. If the instance_type
of the Map of the HeapObject that is being read from the snapshot is of type
kApiReference
then use the value reference_id as the index into the
api_external_references array.
(lldb) br s -f deserializer.cc -l 1016
We looked at how external functions can be handled when creating a snapshot, but we also have to be able to deal with things like classes and be able to serialize/deserialize them (at least that is my understading so far).
When adding a Context to SnapshotCreator there is a parameter named
callback
which is of type SerializeInternalFieldsCallback
:
size_t AddContext(Local<Context> context,
SerializeInternalFieldsCallback callback = SerializeInternalFieldsCallback());
struct SerializeInternalFieldsCallback {
typedef StartupData (*CallbackFunction)(Local<Object> holder, int index,
void* data);
SerializeInternalFieldsCallback(CallbackFunction function = nullptr,
void* data_arg = nullptr)
: callback(function), data(data_arg) {}
CallbackFunction callback;
void* data;
};
An implementation can be provided by creating a new instance of this struct and passing in the data that should be passed to it's callback function.
When is the callback called? There is a test case that demonstrates this and it has a callback in which a break point can be set:
./test/snapshot_test --gtest_filter=SnapshotTest.InternalFields
src/snapshot/context-serializer.cc
contains the following function:
bool ContextSerializer::SerializeJSObjectWithEmbedderFields(
Handle<HeapObject> obj) {
Lets stick a break point in that function and work backwards from there:
(lldb) br s -f context-serializer.cc -l 208
Starting from Snapshot::Create, which will iterate over all the contexts passed in:
v8::StartupData Snapshot::Create(
Isolate* isolate, std::vector<Context>* contexts,
const std::vector<SerializeInternalFieldsCallback>& embedder_fields_serializers,
const DisallowGarbageCollection& no_gc, SerializerFlags flags) {
const int num_contexts = static_cast<int>(contexts->size());
for (int i = 0; i < num_contexts; i++) {
ContextSerializer context_serializer(isolate, flags, &startup_serializer,
embedder_fields_serializers[i]);
context_serializer.Serialize(&contexts->at(i), no_gc);
can_be_rehashed = can_be_rehashed && context_serializer.can_be_rehashed();
context_snapshots.push_back(new SnapshotData(&context_serializer));
}
}
Notice that each context will have a ContextSerializer created for it, and it gets passed the SerializeInternalFieldsCallback and Serialized will be called on that instance passing in the first context. So that leads us to ContextSeralizer::Serialize:
void ContextSerializer::Serialize(Context* o, const DisallowGarbageCollection& no_gc) {
```c++
VisitRootPointer(Root::kStartupObjectCache, nullptr, FullObjectSlot(o));
The second argument is for the description (cont char*). VisitRootPointers can be found in visitors.h and will delegate to Serializer::VisitRootPointers: (src/snapshot/serializer.cc)
void Serializer::VisitRootPointers(Root root, const char* description,
FullObjectSlot start, FullObjectSlot end) {
for (FullObjectSlot current = start; current < end; ++current) {
SerializeRootObject(current);
}
}
And SerializeRootObject will call SerializeObject in our case:
void Serializer::SerializeRootObject(FullObjectSlot slot) {
Object o = *slot;
if (o.IsSmi()) {
PutSmiRoot(slot);
} else {
SerializeObject(Handle<HeapObject>(slot.location()));
}
}
void Serializer::SerializeObject(Handle<HeapObject> obj) {
// ThinStrings are just an indirection to an internalized string, so elide the
// indirection and serialize the actual string directly.
if (obj->IsThinString(isolate())) {
obj = handle(ThinString::cast(*obj).actual(isolate()), isolate());
}
SerializeObjectImpl(obj);
}
This will bring us into ContextSerializer::SerializeObjectImpl (src/snapshot/context-serializer.cc):
void ContextSerializer::SerializeObjectImpl(Handle<HeapObject> obj) {
...
if (SerializeJSObjectWithEmbedderFields(obj)) {
return;
}
...
}
SerializeJSObjectWithEmbedderFields
(src/snapshot/context-serializer.cc):
bool ContextSerializer::SerializeJSObjectWithEmbedderFields(Handle<HeapObject> obj) {
...
std::vector<StartupData> serialized_data;
for (int i = 0; i < embedder_fields_count; i++) {
EmbedderDataSlot embedder_data_slot(*js_obj, i);
original_embedder_values.emplace_back(embedder_data_slot.load_raw(isolate(), no_gc));
Object object = embedder_data_slot.load_tagged();
if (object.IsHeapObject()) {
serialized_data.push_back({nullptr, 0});
} else {
if (serialize_embedder_fields_.callback == nullptr && object == Smi::zero()) {
serialized_data.push_back({nullptr, 0});
} else {
StartupData data = serialize_embedder_fields_.callback(
api_obj, i, serialize_embedder_fields_.data);
serialized_data.push_back(data);
}
}
}
Notice that this call to the callback, serialize_embedder_fields_.callback
returns an instance of StartupData, which is then added to the serialized_data
vector:
class V8_EXPORT StartupData {
public:
bool CanBeRehashed() const;
bool IsValid() const;
const char* data;
int raw_size;
};
A little further down we then have:
for (int i = 0; i < embedder_fields_count; i++) {
StartupData data = serialized_data[i];
if (DataIsEmpty(data)) continue;
// Restore original values from cleared fields.
EmbedderDataSlot(*js_obj, i).store_raw(isolate(), original_embedder_values[i], no_gc);
embedder_fields_sink_.Put(kNewObject, "embedder field holder");
embedder_fields_sink_.PutInt(reference->back_ref_index(), "BackRefIndex");
embedder_fields_sink_.PutInt(i, "embedder field index");
embedder_fields_sink_.PutInt(data.raw_size, "embedder fields data size");
embedder_fields_sink_.PutRaw(reinterpret_cast<const byte*>(data.data),
data.raw_size, "embedder fields data");
delete[] data.data;
}
TODO: take a closer look at how this works. So what the StartupData that is returned from the callback will be placed into the output sink for the.
$ lldb -- ./test/snapshot_test --gtest_filter=SnapshotTest.InternalFields
(lldb) br s -f snapshot_test.cc -l 244
Breakpoint 4: where = snapshot_test`SnapshotTest_InternalFields_Test::TestBody() + 25 at snapshot_test.cc:247:25, address = 0x00000000004192e3
(lldb) r
-> 283 index = snapshot_creator.AddContext(context, si_cb);
(lldb) s
In SnapshotCreator (api.cc) we can find the following:
size_t SnapshotCreator::AddContext(Local<Context> context,
SerializeInternalFieldsCallback callback) {
SnapshotCreatorData* data = SnapshotCreatorData::cast(data_);
...
size_t index = data->contexts_.Size();
data->contexts_.Append(context);
data->embedder_fields_serializers_.push_back(callback)
return index;
So the SnapshotCreateData instance has a vector that contains these callback:
std::vector<SerializeInternalFieldsCallback> embedder_fields_serializers_;
So that is all AddContext
does. Lets take a look at the call to actually
create the blob:
startup_data = snapshot_creator.CreateBlob(SnapshotCreator::FunctionCodeHandling::kKeep);
The layout of the binary snapshot can be found in src/snapshot/snapshot.cc
:
// Snapshot blob layout:
// [0] number of contexts N
// [1] rehashability
// [2] checksum
// [3] (128 bytes) version string
// [4] offset to readonly
// [5] offset to context 0
// [6] offset to context 1
// ...
// ... offset to context N - 1
// ... startup snapshot data
// ... read-only snapshot data
// ... context 0 snapshot data
// ... context 1 snapshot data