Skip to content

How to add new metrics

Abel Serrano Juste edited this page May 14, 2018 · 3 revisions

Here are the steps for adding a new metric to WikiChron.

The code that computes each metric is located inside the Wikichron's source code, more precisely it's in the lib/metrics/stats.py.

For each new mettric, you need to define a function —likely in the stats.py file— that computes the metric value. This function has to have two arguments: (data, index) and it has to return a Pandas Time Series where it has an index .

  • data is all the edition history data (except for the bots activity) of the wiki formatted as a pandas dataframe.
  • index is an index with all the months between the date of creation of the wiki and the last date of the dump, in a pandas DatetimeIndex format. Use it as a base index to reindex your data in order to not accidentally delete dates with empty data when you are grouping data or so.
The format for the data pandas dataframe argument is as follows:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21628 entries, 0 to 21627
Data columns (total 8 columns):
page_id             21628 non-null int64
page_title          21628 non-null object
page_ns             21628 non-null int64
revision_id         21628 non-null int64
timestamp           21628 non-null datetime64[ns]
contributor_id      21628 non-null object
contributor_name    21628 non-null object
bytes               21628 non-null int64
dtypes: datetime64[ns](1), int64(4), object(3)

Then, you need to create an instance of the Metric class and append it to the returning list in generate_metrics() inside lib/metrics/metrics_generator.py file. If your metric isn't in lib/metrics/stats.py file, make sure that your metric functions is imported in lib/metrics/metrics_generator.py.

An instance of the Metric class has to have the following attributes:

  • code: an unique string id for this metric. It could be the name of the function in the stats.py file.
  • text: title of the metric to display in the UI selection list. Make it short but inteligible. It should not use more than one line in the side bar interface.
  • category: A enum value of type MetricCategory which this metric belongs to (see MetricCategory Enum source).
  • func: Python function to call in order to compute this metric. Presumably, a function located in the stats.py file.
  • descp: Short description (about a paragraph) of what the metric consists in.
Finally, if your new metric is in a new category, or in other words, if you have created another category along your new metric, you'll need to add it to the metric_names list in side_bar.py.