Skip to content

Optimization Examples Using Cpp

masajiro edited this page May 25, 2020 · 2 revisions

Further improvement of the search performance can be achieved by additional processing as follows.

ANNG optimization

ANNG quick optimization (recommended)

Optimization for the number of initial edges

The default number of initial edges for each node in a default graph (ANNG) is a fixed value 10. To optimize the number, first, the objects should be inserted without building indexes as follows.

/ construct-optimized-anng.cpp                                                                                                              
#include        "NGT/Index.h"
using namespace std;
int main(int argc, char **argv)
{
  ifstream      is("objects.tsv");
  string        line;
  getline(is, line);
  stringstream  linestream(line);
  NGT::Property property;
  size_t n;
  linestream >> n;
  linestream >> property.dimension;
  property.objectType   = NGT::ObjectSpace::ObjectType::Float;
  property.distanceType = NGT::Index::Property::DistanceType::DistanceTypeCosine;
  string path("fasttext.anng");
  NGT::Index::create(path, property);
  NGT::Index    index(path);
  while (getline(is, line)) {
    stringstream        linestream(line);
    vector<float>       object;
    while (!linestream.eof()) {
      float value;
      linestream >> value;
      object.push_back(value);
    }
    index.append(object);
  }
  index.save();
  return 0;
}

Next, execute the optimization script as follows. This script extracts the optimal number of initial edges for the inserted objects and just set the number into the index profile.

// optimize-anng.cpp
#include        "NGT/Index.h"
#include        "NGT/GraphOptimizer.h"

using namespace std;
int main(int argc, char **argv)
{
  NGT::GraphOptimizer::ANNGEdgeOptimizationParameter parameter;
  NGT::GraphOptimizer graphOptimizer(false);                                                                                
  graphOptimizer.optimizeNumberOfEdgesForANNG("fasttext.anng", parameter);      
}

Then, build indexes as follows.

// build-optimized-anng.cpp
#include        "NGT/Index.h"
using namespace std;
int main(int argc, char **argv)
{
  NGT::Index    index("fasttext.anng");
  index.createIndex(16);        // 16 is the number of threads to construct.
  index.save();
  return 0;
}

Optimization for the search parameters

The next script finally optimizes the search parameters about the explored edges and memory prefetch for the existing indexes. This script does not modify the index data structure.

// optimize-search-parameters.cpp                                                                                                            
#include        "NGT/Index.h"
#include        "NGT/GraphOptimizer.h"

using namespace std;
int main(int argc, char **argv)
{
  NGT::GraphOptimizer graphOptimizer(false);
  graphOptimizer.setProcessingModes(false, true, true, true);
  graphOptimizer.optimizeSearchParameters("fasttext.anng");
}

Since an accuracy table which converts expected accuracies into epsilons is generated as well, an expected accuracy value instead of an epsilon value can be specified for search as follows after this optimization.

  searchQuery.setExpectedAccuracy(0.9);  // instead of setEpsilon()

Optinal ANNG refinement

Above-mentioned ANNG optimization will bring adequate search performance improvement for ANNG. Refining ANNG (RANNG) also improves the search performance, because it improves accuracy of neighboring nodes for each node by searching with each node. However, it is optional because it needs a long processing time. The following command to refine an ANNG must be executed after building the indexes.

// refine-anng.cpp 
#include        "NGT/Index.h"
#include        "NGT/GraphOptimizer.h"
using namespace std;
int main(int argc, char **argv)
{
  NGT::Index    index("fasttext.anng");
  NGT::GraphReconstructor::refineANNG(index, false);
  index.save("fasttext.ranng");
  return 0;
}

Graph structure optimization (ONNG construction)

Since further more improvement of the search performance by using an ONNG can be expected, how to generate ONNG is described.

ANNG to reconstruct ONNG

ONNG generation requires an ANNG with more edges than default initial edges as the excess edges are removed to optimize the graph. ONNG requires an ANNG at least with edges that are optimized as mentioned above. As a result, an ONNG from the ANNG can provide almost the best performance. However, if the best performance is needed, a larger number than the optimized number may be specified. The following example shows the creation of an ANNG with 100 edges.

// construct-fasttext-100.cpp                                                                                                                
#include        "NGT/Index.h"
using namespace std;
int main(int argc, char **argv)
{
  ifstream      is("objects.tsv");
  string        line;
  getline(is, line);
  stringstream  linestream(line);
  NGT::Property property;
  size_t n;
  linestream >> n;
  linestream >> property.dimension;
  property.objectType           = NGT::ObjectSpace::ObjectType::Float;
  property.distanceType         = NGT::Index::Property::DistanceType::DistanceTypeCosine;
  property.edgeSizeForCreation  = 100;
  string path("fasttext.anng-100");
  NGT::Index::create(path, property);
  NGT::Index    index(path);
  while (getline(is, line)) {
    stringstream        linestream(line);
    vector<float>       object;
    while (!linestream.eof()) {
      float value;
      linestream >> value;
      object.push_back(value);
    }
    index.append(object);
  }
  index.createIndex(16);        // 16 is the number of threads to construct.                                                                 
  index.save();
  return 0;
}

Conversion from ANNG to ONNG

An ONNG can be converted from an ANNG with the NGT::GraphOptimizer as follows.

// build-onng.cpp                                                                                                                            
#include        "NGT/Index.h"
#include        "NGT/GraphOptimizer.h"

using namespace std;
int main(int argc, char **argv)
{
  NGT::GraphOptimizer   graphOptimizer(false);
  int numOfOutgoingEdges = 10;
  int numOfIncomingEdges = 100;
  int numOfQueries = 200;
  int numOfResultantObjects = 20; // k                                                                                                       
  graphOptimizer.set(numOfOutgoingEdges, numOfIncomingEdges, numOfQueries, numOfResultantObjects);
  graphOptimizer.execute("fasttext.anng-100", "fasttext.onng");
  return 0;
}

execute() executes degree adjustment, shortcut reduction (pass adjustment), search and prefetch parameter adjustment and accuracy table generation. numOfOutgoingEdges and numOfIncomingEdges specify the number of outgoing edges and incoming edges, respectively. numOfQueries is the number of query objects to optimize. numOfResultantObjects is the number of resultant objects to be searched. In this example, the number of incoming edges is the same as the number of edges for ANNG creation, i.e., 100. However, since the number of real edges of an ANNG is more than the specified number, a larger number of incoming edges can be specified.

Search with ONNG

The search-fasttext.cpp above must be modified to use the ONNG by replacing 'fasttext.anng' with 'fasttext.onng' on the following line.

  NGT::Index            index("fasttext.onng");

Even if the search time increases compared with the result of the first ANNG, the search accuracy of the ONNG should be higher. Therefore, since the epsilon value for an ANNG needs to be increased to acquire higher accuracies, the search time of an ANNG increases more than that of an ONNG.