Skip to content

Commit

Permalink
Reduce the size of the index (#3666)
Browse files Browse the repository at this point in the history
This change seeks to reduce the size of the index in two ways:
1. Better schema design for the 1:N mapping tables
2. Dropping some of the mapping data that is not particularly interesting per manifest (aka per version)

### Better schema design
This is achieved by the map table having no `rowid` and using a primary key with the value first.  This makes the table already sorted by the value, thus the reverse lookups are fast.  It also drops a fair amount of the data in the table itself to remove the `rowid`, given that it was ~1/3 of the rows.

### Dropping map data
We don't actually use the fact that we know that different versions have different tags (or any other data).  Thus, we can simply have one manifest entry per package identifier have all of the values and maintain the same functionality.  There is a slight loss of fidelity if one is reading through the values via API, but this is deemed acceptable given the large data savings.  I explicitly left the product codes alone, as this does have value to keep per version (even if we are not using it currently).
  • Loading branch information
JohnMcPMS authored Sep 25, 2023
1 parent 9c5c5db commit 657d33c
Show file tree
Hide file tree
Showing 24 changed files with 636 additions and 133 deletions.
2 changes: 2 additions & 0 deletions .github/actions/spelling/expect.txt
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ DACL
datetimeoffset
Dbg
debian
dedupe
deigh
deleteifnotneeded
DENYWR
Expand Down Expand Up @@ -264,6 +265,7 @@ luffy
Luffytaro
maclachlan
malware
mapdatafolding
mapview
Maxed
maxvalue
Expand Down
210 changes: 187 additions & 23 deletions src/AppInstallerCLITests/SQLiteIndex.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,49 +35,53 @@ SQLiteIndex CreateTestIndex(const std::string& filePath, std::optional<Schema::V
// If no specific version requested, then use generator to run against the last 3 versions.
if (!version)
{
version = GENERATE(Schema::Version{ 1, 2 }, Schema::Version{ 1, 3 }, Schema::Version{ 1, 4 }, Schema::Version::Latest());
Schema::Version latestVersion = SQLiteIndex::GetLatestVersion();
if (latestVersion.MajorVersion != 1)
{
throw std::exception("You added major version 2, figure out how to deal with these tests that do back compat coverage!");
}

// Relies on the fact that min version is already >= 2
Schema::Version versionMinus1 = Schema::Version{ 1, latestVersion.MinorVersion - 1 };
Schema::Version versionMinus2 = Schema::Version{ 1, latestVersion.MinorVersion - 2 };

version = GENERATE_COPY(Schema::Version{ versionMinus2 }, Schema::Version{ versionMinus1 }, Schema::Version{ latestVersion });
}

return SQLiteIndex::CreateNew(filePath, version.value());
}

Schema::Version TestPrepareForRead(SQLiteIndex& index)
{
if (index.GetVersion() == Schema::Version{ 1, 2 })
Schema::Version latestVersion = SQLiteIndex::GetLatestVersion();
if (latestVersion.MajorVersion != 1)
{
Schema::Version version = GENERATE(Schema::Version{ 1, 2 });

if (version != Schema::Version{ 1, 2 })
{
index.ForceVersion(version);
return version;
}
throw std::exception("You added major version 2, figure out how to deal with these tests that do back compat coverage!");
}
else if (index.GetVersion() == Schema::Version{ 1, 3 })
{
Schema::Version version = GENERATE(Schema::Version{ 1, 2 }, Schema::Version{ 1, 3 });

if (version != Schema::Version{ 1, 3 })
{
index.ForceVersion(version);
return version;
}
// Relies on the fact that min version is already >= 2
Schema::Version versionMinus1 = Schema::Version{ 1, latestVersion.MinorVersion - 1 };
Schema::Version versionMinus2 = Schema::Version{ 1, latestVersion.MinorVersion - 2 };

if (index.GetVersion() == versionMinus2)
{
// Degenerate case where we don't need to do anything
}
else if (index.GetVersion() == Schema::Version{ 1, 4 })
else if (index.GetVersion() == versionMinus1)
{
Schema::Version version = GENERATE(Schema::Version{ 1, 2 }, Schema::Version{ 1, 3 }, Schema::Version{ 1, 4 });
Schema::Version version = GENERATE_COPY(Schema::Version{ versionMinus2 }, Schema::Version{ versionMinus1 });

if (version != Schema::Version{ 1, 4 })
if (version != versionMinus1)
{
index.ForceVersion(version);
return version;
}
}
else if (index.GetVersion() == Schema::Version{ 1, 5 })
else if (index.GetVersion() == latestVersion)
{
Schema::Version version = GENERATE(Schema::Version{ 1, 2 }, Schema::Version{ 1, 3 }, Schema::Version{ 1, 4 }, Schema::Version{ 1, 5 });
Schema::Version version = GENERATE_COPY(Schema::Version{ versionMinus2 }, Schema::Version{ versionMinus1 }, Schema::Version{ latestVersion });

if (version != Schema::Version{ 1, 5 })
if (version != latestVersion)
{
index.ForceVersion(version);
return version;
Expand Down Expand Up @@ -321,6 +325,17 @@ bool AreArpVersionsSupported(const SQLiteIndex& index, const Schema::Version& te
return (index.GetVersion() >= Schema::Version{ 1, 5 } && testVersion >= Schema::Version{ 1, 5 });
}

bool IsMapDataFoldingSupported(const SQLiteIndex& index, const Schema::Version& testVersion)
{
UNSCOPED_INFO("Index " << index.GetVersion() << " | Test " << testVersion);
return (index.GetVersion() >= Schema::Version{ 1, 7 } && testVersion >= Schema::Version{ 1, 7 });
}

bool IsMapDataFolded(const SQLiteIndex& index)
{
return (index.GetVersion() >= Schema::Version{ 1, 7 });
}

std::string GetPropertyStringByKey(const SQLiteIndex& index, SQLite::rowid_t id, PackageVersionProperty property, std::string_view version, std::string_view channel)
{
auto manifestId = index.GetManifestIdByKey(id, version, channel);
Expand Down Expand Up @@ -3187,3 +3202,152 @@ TEST_CASE("SQLiteIndex_CheckConsistency_FindEmbeddedNull", "[sqliteindex]")

REQUIRE(!index.CheckConsistency(true));
}

TEST_CASE("SQLiteIndex_MapDataFolding_Tags", "[sqliteindex][mapdatafolding]")
{
TempFile tempFile{ "repolibtest_tempdb"s, ".db"s };
INFO("Using temporary file named: " << tempFile.GetPath());

std::string tag1 = "Tag1";
std::string tag2 = "Tag2";

SQLiteIndex index = SearchTestSetup(tempFile, {
{ "Id", "Name", "Publisher", "Moniker", "Version1", "", { tag1 }, { "Command" }, "Path1", {}, { "PC1" } },
{ "Id", "Name", "Publisher", "Moniker", "Version2", "", { tag2 }, { "Command" }, "Path2", {}, { "PC2" } },
});

// Apply the map data folding if it is present in the created test index.
index.PrepareForPackaging();

Schema::Version testVersion = TestPrepareForRead(index);

SearchRequest request1;
request1.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::Tag, MatchType::Exact, tag1));
auto results1 = index.Search(request1);

SearchRequest request2;
request2.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::Tag, MatchType::Exact, tag2));
auto results2 = index.Search(request1);

REQUIRE(results1.Matches.size() == 1);
REQUIRE(results2.Matches.size() == 1);
REQUIRE(results1.Matches[0].first == results2.Matches[0].first);
}

TEST_CASE("SQLiteIndex_MapDataFolding_PFNs", "[sqliteindex][mapdatafolding]")
{
TempFile tempFile{ "repolibtest_tempdb"s, ".db"s };
INFO("Using temporary file named: " << tempFile.GetPath());

std::string pfn1 = "PFN1";
std::string pfn2 = "PFN2";

SQLiteIndex index = SearchTestSetup(tempFile, {
{ "Id", "Name", "Publisher", "Moniker", "Version1", "", { }, { "Command" }, "Path1", { pfn1 }, { } },
{ "Id", "Name", "Publisher", "Moniker", "Version2", "", { }, { "Command" }, "Path2", { pfn2 }, { } },
});

// Apply the map data folding if it is present in the created test index.
index.PrepareForPackaging();

Schema::Version testVersion = TestPrepareForRead(index);

SearchRequest request1;
request1.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::PackageFamilyName, MatchType::Exact, pfn1));
auto results1 = index.Search(request1);

SearchRequest request2;
request2.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::PackageFamilyName, MatchType::Exact, pfn2));
auto results2 = index.Search(request1);

REQUIRE(results1.Matches.size() == 1);
REQUIRE(results2.Matches.size() == 1);
REQUIRE(results1.Matches[0].first == results2.Matches[0].first);

auto versionKeys = index.GetVersionKeysById(results1.Matches[0].first);
REQUIRE(versionKeys.size() == 2);

auto manifestId1 = index.GetManifestIdByKey(results1.Matches[0].first, versionKeys[0].GetVersion().ToString(), versionKeys[0].GetChannel().ToString());
auto manifestId2 = index.GetManifestIdByKey(results1.Matches[0].first, versionKeys[1].GetVersion().ToString(), versionKeys[1].GetChannel().ToString());

REQUIRE(manifestId1.has_value());
REQUIRE(manifestId2.has_value());

auto pfnValues1 = index.GetMultiPropertyByManifestId(manifestId1.value(), PackageVersionMultiProperty::PackageFamilyName);
auto pfnValues2 = index.GetMultiPropertyByManifestId(manifestId2.value(), PackageVersionMultiProperty::PackageFamilyName);

if (IsMapDataFoldingSupported(index, testVersion))
{
REQUIRE(pfnValues1.size() == 2);
REQUIRE(pfnValues2.size() == 2);
REQUIRE(pfnValues1[0] != pfnValues1[1]);
}
else if (IsMapDataFolded(index))
{
if (manifestId1 > manifestId2)
{
REQUIRE(pfnValues1.size() == 2);
REQUIRE(pfnValues2.size() == 0);
REQUIRE(pfnValues1[0] != pfnValues1[1]);
}
else
{
REQUIRE(pfnValues1.size() == 0);
REQUIRE(pfnValues2.size() == 2);
REQUIRE(pfnValues2[0] != pfnValues2[1]);
}
}
else
{
REQUIRE(pfnValues1.size() == 1);
REQUIRE(pfnValues2.size() == 1);
REQUIRE(pfnValues1[0] != pfnValues2[0]);
}
}

TEST_CASE("SQLiteIndex_MapDataFolding_ProductCodes", "[sqliteindex][mapdatafolding]")
{
TempFile tempFile{ "repolibtest_tempdb"s, ".db"s };
INFO("Using temporary file named: " << tempFile.GetPath());

std::string pc1 = "PC1";
std::string pc2 = "PC2";

SQLiteIndex index = SearchTestSetup(tempFile, {
{ "Id", "Name", "Publisher", "Moniker", "Version1", "", { }, { "Command" }, "Path1", { }, { pc1 } },
{ "Id", "Name", "Publisher", "Moniker", "Version2", "", { }, { "Command" }, "Path2", { }, { pc2 } },
});

// Apply the map data folding if it is present in the created test index.
index.PrepareForPackaging();

Schema::Version testVersion = TestPrepareForRead(index);

SearchRequest request1;
request1.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::ProductCode, MatchType::Exact, pc1));
auto results1 = index.Search(request1);

SearchRequest request2;
request2.Inclusions.emplace_back(PackageMatchFilter(PackageMatchField::ProductCode, MatchType::Exact, pc2));
auto results2 = index.Search(request1);

REQUIRE(results1.Matches.size() == 1);
REQUIRE(results2.Matches.size() == 1);
REQUIRE(results1.Matches[0].first == results2.Matches[0].first);

auto versionKeys = index.GetVersionKeysById(results1.Matches[0].first);
REQUIRE(versionKeys.size() == 2);

auto manifestId1 = index.GetManifestIdByKey(results1.Matches[0].first, versionKeys[0].GetVersion().ToString(), versionKeys[0].GetChannel().ToString());
auto manifestId2 = index.GetManifestIdByKey(results1.Matches[0].first, versionKeys[1].GetVersion().ToString(), versionKeys[1].GetChannel().ToString());

REQUIRE(manifestId1.has_value());
REQUIRE(manifestId2.has_value());

auto pcValues1 = index.GetMultiPropertyByManifestId(manifestId1.value(), PackageVersionMultiProperty::ProductCode);
auto pcValues2 = index.GetMultiPropertyByManifestId(manifestId2.value(), PackageVersionMultiProperty::ProductCode);

REQUIRE(pcValues1.size() == 1);
REQUIRE(pcValues2.size() == 1);
REQUIRE(pcValues1[0] != pcValues2[0]);
}
Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,7 @@
<ClInclude Include="Microsoft\Schema\1_6\Interface.h" />
<ClInclude Include="Microsoft\Schema\1_6\SearchResultsTable.h" />
<ClInclude Include="Microsoft\Schema\1_6\UpgradeCodeTable.h" />
<ClInclude Include="Microsoft\Schema\1_7\Interface.h" />
<ClInclude Include="Microsoft\Schema\IPinningIndex.h" />
<ClInclude Include="Microsoft\Schema\IPortableIndex.h" />
<ClInclude Include="Microsoft\Schema\ISQLiteIndex.h" />
Expand Down Expand Up @@ -488,6 +489,7 @@
<ClCompile Include="Microsoft\Schema\1_5\Interface_1_5.cpp" />
<ClCompile Include="Microsoft\Schema\1_6\Interface_1_6.cpp" />
<ClCompile Include="Microsoft\Schema\1_6\SearchResultsTable_1_6.cpp" />
<ClCompile Include="Microsoft\Schema\1_7\Interface_1_7.cpp" />
<ClCompile Include="Microsoft\Schema\MetadataTable.cpp" />
<ClCompile Include="Microsoft\Schema\Pinning_1_0\PinningIndexInterface_1_0.cpp" />
<ClCompile Include="Microsoft\Schema\Pinning_1_0\PinTable.cpp" />
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@
<Filter Include="Rest\Schema\1_6\Json">
<UniqueIdentifier>{b2e78f3d-931e-432c-8485-255b1dbc9db7}</UniqueIdentifier>
</Filter>
<Filter Include="Microsoft\Schema\1_7">
<UniqueIdentifier>{f610927a-6f1d-42c5-9ad9-b59790091944}</UniqueIdentifier>
</Filter>
</ItemGroup>
<ItemGroup>
<ClInclude Include="pch.h">
Expand Down Expand Up @@ -387,6 +390,9 @@
<ClInclude Include="Rest\Schema\1_6\Interface.h">
<Filter>Rest\Schema\1_6</Filter>
</ClInclude>
<ClInclude Include="Microsoft\Schema\1_7\Interface.h">
<Filter>Microsoft\Schema\1_7</Filter>
</ClInclude>
</ItemGroup>
<ItemGroup>
<ClCompile Include="pch.cpp">
Expand Down Expand Up @@ -614,6 +620,9 @@
<ClCompile Include="Rest\Schema\1_6\RestInterface_1_6.cpp">
<Filter>Rest\Schema\1_6</Filter>
</ClCompile>
<ClCompile Include="Microsoft\Schema\1_7\Interface_1_7.cpp">
<Filter>Microsoft\Schema\1_7</Filter>
</ClCompile>
</ItemGroup>
<ItemGroup>
<None Include="PropertySheet.props" />
Expand Down
Loading

0 comments on commit 657d33c

Please sign in to comment.