Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiency measurements (how to gather?) #23

Open
lintool opened this issue Apr 5, 2019 · 6 comments
Open

Efficiency measurements (how to gather?) #23

lintool opened this issue Apr 5, 2019 · 6 comments
Assignees

Comments

@lintool
Copy link
Member

lintool commented Apr 5, 2019

hi @andrewtrotman can you think about how you'd like the jig to report efficiency metrics? I see a few options:

  1. jig could record it - but would be coarse grained
  2. the image itself could record it - but relay the metrics back to the jig in some standard format

Both have their advantages and disadvantages... thoughts?

@amallia
Copy link
Member

amallia commented Apr 5, 2019

In terms of efficiency, in my opinion, it is fundamental to measure the size of the index.
Being able to store additional data can definitely improve query processing speed and quality, which, in turn, corresponds to higher main memory usage utilization.

Should we have specific hard limits (memory, space, CPUs..) on the Docker instance running? For example, by forcing the container to run on a single CPU we ensure that ad-hoc retrieval runs on a single core too.

Moreover, is efficiency only related to query processing? Is efficiency of indexing relevant at all?

@andrewtrotman
Copy link

Efficiency essentially breaks down into efficiency of space and efficiency of time.

In the case of the indexer I think we can just use the output of the UNIX time command to tell us how long it took to build the index. If the indexer also reports time it would be interesting to see how the two compare. We can use the UNIX ls command to see how large the index is, but the indexer will need to tell us where to look.

For the search I think the 250 topics we have is way too small for measuring search time. The brief test I ran suggested that some of those topics will take near-enough to 0 time. So I think we should use the 10,000 topics from the TREC Million Query Track (or 20,000 if we use both years). I'd like to compare what the search engine claims against what the UNIX time command claims. Sure, UNIX time will include start-up, shut-down, and index-load time, but that is why we also need to look at what the search engine claims.

So we need, I think, a "spec":

Nothing really for indexing (is there?), just agreement on a single line of output that states where the index can be found so that we can start the container and "ls" to get the index size. We can easily change the jig to call UNIX time command.

For search, we need to agree on when we start the timer, and when it ends, and what we are measuring (throughput or latency). We can turn throughput into latency by setting the thread count to 1. So lets measure throughput. I think we start the timer the last possible moment before the first query and stop the timer at the first possible moment after we complete the last query. As we all have the same I/O demands when it comes to producing the TREC run file, we could agree to include or exclude that time - thoughts please.

@frrncl
Copy link
Member

frrncl commented Apr 12, 2019

Hi,

What about indexing time in the case of ML stuff? Should we break it down into training, validation, ...? Also, do we need some break down on the idea of index size in this case?

Nicola

@andrewtrotman
Copy link

Agreed - we need to measure the efficiency of the ML stuff. I'm hoping there's a change to do the ML stuff before indexing because I want to learn the best solution then bake it into my index.

@albpurpura
Copy link
Member

NVSM performs indexing before training and validation. I think indexing could be a separate step from training and test, also for NeuIR models. Training, validation and test are performed on different subsets of topics specified by the user (without cross validation).
To summarize, the steps we consider are:

  1. indexing
  2. training and validation (with early stopping)
  3. test
    What do you think of this sequence of steps? Can we adopt this also for other NeuIR models?

@cmacdonald
Copy link
Member

The (nuclear) alternative would be for efficiency to be measured using the jig by sending queries on stdin (one by one).

In any case, I agree that we should record the number of cores & threads involved in both retrieval and indexing, so we get like-for-like comparisons

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants