Migrating classifier data from an older classifier-reborn structure #172

tracegilton · 2018-02-22T02:32:32Z

Hi all,

I am attempting to upgrade a classifier that I built using a previous version of ClassifierReborn::Bayes. It looks like I initialized the classifier before backends were added, so now I am running into compatibility issues. I store the classifier structure on disk with Marshal and now when it is loaded it does not have the backend attribute that the newer gem expects.

Is there a best practice for how to update the older classifier so that it will be compatible with the backends system?

The text was updated successfully, but these errors were encountered:

Ch4s3 · 2018-02-26T02:27:44Z

Interesting. I guess we should have thought this through before publishing the new code. @ibnesayeed do you have any thoughts?

Without having taken a look yet, I imagine you could do some metaprogramming to open the class and add the backend attribute. But, I'm not sure. I'll give it some thought.

ibnesayeed · 2018-02-26T04:26:35Z

Marshaling a complex object is always going to have the potential of breaking compatibility when the data-structure changes or any other attributes change. I remember wasting a month figuring out why a Weka model was not predicting anything and the culprit turns out to be the fact that the model was built using a different version of Weka than what was being used to to load the model file.

I think, we need to implement importer/exporter to serialize just the data without tying it to classes or other states of the object. The output can be something like JSON, YAML, or even Google's Protocol Buffer. This will not only help migrate models from one version to the other, but also from one backend to the other.

tracegilton · 2018-02-26T22:02:27Z

Thank you for the replies — Exporting as YAML seems like a good way for me to move the trained data into the new structure, but I am unfamiliar with the inner-workings of the backends code to know the impact this will have.

I was able to use a new ClassifierReborn exported to YAML as a template and added in my old data and training totals. When re-importing that YAML structure, it looks correct and it can classify data but training new data still fails.

If nothing else, I can use the YAML export to re-train a new classifier, looping over each word for weight number of times 😅

ibnesayeed · 2018-02-26T23:04:26Z

The point of an object-independent data-only serialization would to decouple the data from the class structure and object state. Exporting such data structure means looping through all the stored keys and serializing them in a way that is backend independent. Importing it means populating the backend store with those keys and values with the synthesized data rather than loading a ready-made object (as in case of marshaling).

Ch4s3 · 2018-02-27T03:21:55Z

@ibnesayeed I feel like an import/export class is probably the right solution. I can take a crack at it maybe this weekend? If you want to take a look, feel free.

ibnesayeed · 2018-02-27T03:29:41Z

Once you have a PR in place I will be happy to review it. My current priorities are keeping my hands very tight otherwise I would have implemented it.

Ch4s3 · 2018-03-01T19:32:38Z

Sounds good. I'll dig in as soon as I can.

ibnesayeed · 2018-03-01T21:22:10Z

We need to implement following two methods in each backend and have a proxy/alias method to call them from the main Bayes class:

def import(yaml_data_file)
  # Read the yaml_data_file and populate the backend in use
end

def export(yaml_data_file)
  # Traverse the data structure in the used backend and serialize it to the yaml_data_file
end

Instead of specifying file name in the parameter, we can supply/return objects and move the serialization/deserialization responsibility in a task or in some other method. That way the YAML support will not be baked in, but other alternate formats can also be used without changing the underlying implementation.

Exported YAML data file (say, bayes-data.yml) will looks something like this:

---
# Imported from ClassifierReborn::Bayes
total_words: 7
total_trainings: 3
category_counts:
  - Ham:
    - training: 2
    - word: 4
  - Spam:
    - training: 1
    - word: 3
categories:
  - Ham:
    - sunday: 1
    - holiday: 1
    - work: 2
  - Spam:
    - holiday: 1
    - winner: 2

Ch4s3 · 2018-03-01T22:43:06Z

I'm trying to think if we need to do a minor release of a pre-backend version to make this work. Thoughts?

ibnesayeed · 2018-03-01T22:48:15Z

I'm trying to think if we need to do a minor release of a pre-backend version to make this work. Thoughts?

That's a good idea indeed. This feature can be released as minor versions for both pre-and post-backend releases at the same time.

Ch4s3 · 2018-03-01T23:07:22Z

Ok, I'll try building it against 2.1, and releasing this as 2.1.1 and 2.2.1. It kind of breaks semver to add new functionality in a patch version, but I don't see a way around that.

ibnesayeed · 2018-03-01T23:12:51Z

Yes, the backend change was big enough to warrant a major version bump, but we couldn't see it coming. So, for now 2.1.1 and 2.2.1 will do the trick if we don't be too religious about the semver.

Ch4s3 · 2018-03-01T23:19:46Z

I agree. I'll take a look tonight then and see if I can pull together a poc.

Ch4s3 · 2018-03-02T02:07:20Z

Ok I have a WIP pr at #174

Ch4s3 mentioned this issue Mar 2, 2018

first pass on import/export #174

Closed

6 tasks

Ch4s3 mentioned this issue Mar 2, 2018

Separate tokenizer from hasher #162

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrating classifier data from an older classifier-reborn structure #172

Migrating classifier data from an older classifier-reborn structure #172

tracegilton commented Feb 22, 2018

Ch4s3 commented Feb 26, 2018

ibnesayeed commented Feb 26, 2018 •

edited

Loading

tracegilton commented Feb 26, 2018

ibnesayeed commented Feb 26, 2018

Ch4s3 commented Feb 27, 2018

ibnesayeed commented Feb 27, 2018

Ch4s3 commented Mar 1, 2018

ibnesayeed commented Mar 1, 2018

Ch4s3 commented Mar 1, 2018

ibnesayeed commented Mar 1, 2018

Ch4s3 commented Mar 1, 2018

ibnesayeed commented Mar 1, 2018

Ch4s3 commented Mar 1, 2018

Ch4s3 commented Mar 2, 2018

Migrating classifier data from an older classifier-reborn structure #172

Migrating classifier data from an older classifier-reborn structure #172

Comments

tracegilton commented Feb 22, 2018

Ch4s3 commented Feb 26, 2018

ibnesayeed commented Feb 26, 2018 • edited Loading

tracegilton commented Feb 26, 2018

ibnesayeed commented Feb 26, 2018

Ch4s3 commented Feb 27, 2018

ibnesayeed commented Feb 27, 2018

Ch4s3 commented Mar 1, 2018

ibnesayeed commented Mar 1, 2018

Ch4s3 commented Mar 1, 2018

ibnesayeed commented Mar 1, 2018

Ch4s3 commented Mar 1, 2018

ibnesayeed commented Mar 1, 2018

Ch4s3 commented Mar 1, 2018

Ch4s3 commented Mar 2, 2018

ibnesayeed commented Feb 26, 2018 •

edited

Loading