-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first pass on import/export #174
Conversation
@ibnesayeed what are your thoughts? Is object serialization the wrong direction? I'm trying to think through how to make this work with the pre-backend code. |
@parkr is the best practice for releasing a point release, e.g. |
@Ch4s3 I create a branch like |
@parkr thanks! That makes a lot of sense, I was having trouble finding a good reference for this. |
The hash approach is good, that's how I anticipated it when I described in the sample example in #172 and said that rather than passing YAML files as argument, we should use objects (I meant hashes) in the backend and deal with the serialization outside to allow other formats not just YAML. |
Ok. I’ll figure out a roll out strategy later today or tomorrow. |
This is turning out to be a bit tricky to get working with new and old versions. Stay tuned. |
@ibnesayeed can you take a look at this. I have it working with the in memory backend for the new code, but not redis. I'm not sure what I'm missing though. Once this works with redis, the back port should be simple. |
I would envision it organized something like this:
# classifier-reborn/lib/classifier-reborn/data_handler/bayes_data_handler.rb
module ClassifierReborn
module DataHandler
def import!(classifier_obj, data_file)
data = YAML::load_file(data_file)
classifier_obj.import!(data)
end
def export(classifier_obj, output_file)
data = classifier_obj.export
File.write(output_file, data.to_yaml)
end
end
end
# classifier-reborn/lib/classifier-reborn/bayes.rb
module ClassifierReborn
class Bayes
def import!(data)
@backend.import!(data)
end
def export
@backend.export
end
end
end
# classifier-reborn/lib/classifier-reborn/backends/bayes_memory_backend.rb
module ClassifierReborn
class BayesMemoryBackend
def import!(data)
reset
# Iterate over the data and populate the backend
end
def export
# Return a data hash based on the current state of the backend
end
end
end
# classifier-reborn/lib/classifier-reborn/backends/redis_memory_backend.rb
module ClassifierReborn
class BayesRedisBackend
def import!(data)
reset
# Iterate over the data and populate the backend
end
def export
# Return a data hash based on the current state of the backend
end
end
end Once you have it working on the current version, it should be easier to backport in older version. |
@ibnesayeed I like that organization a lot. My primary concern at the moment is that I'm a bit stuck with the implementation on the Redis backend. |
If I guess it right, you might have trouble fetching all the values (when exporting) that are stored as hashes or nested hashes in the Redis backend (such as Following is an untested pseud-code to illustrate what I mean, which can be tested in the Pry terminal to see what is going on. data[:category_training_count] = {}
next_cursor = "0"
loop do
next_cursor, records = @redis.hscan(:category_training_count, next_cursor)
records.each do |k, v|
data[:category_training_count][k] = v.to_i
end
break if next_cursor == "0"
end |
You can use the following private helper method to fetch stored hashes of categories and words in each category. # classifier-reborn/lib/classifier-reborn/backends/bayes_redis_backend.rb
module ClassifierReborn
class BayesRedisBackend
private
def fetch_hash(name)
obj = {}
next_cursor = "0"
loop do
next_cursor, records = @redis.hscan(name, next_cursor)
obj.merge!(records.map{|k, v| [k, v.to_i]}.to_h)
break if next_cursor == "0"
end
obj
end
end
end |
It turns out that HGETALL is more straightforward command for hashes in Redis. So the following code should work just as good (no need to scan through cursor and merging subsets): data[:category_training_count] = @redis.hgetall(:category_training_count).transform_values(&:to_i) |
Thanks. I'll work on this a bit more in the next day or so. |
any progress on this PR? |
Trying to address #172
I'm still working through this, but dumping a plain Ruby hash seems like the way to go.
TODO: