Skip to content

Commit

Permalink
Avoid loading samples when Classifier isn't used (#4540)
Browse files Browse the repository at this point in the history
samples.json is a very large (3.7M) json file, which allocates 183209 ruby
strings, which takes 8.2MB of memory according to memsize_of.

Instead of keeping the cache around when loading language.rb, we can
instead just load the JSON and allow it to be GC'd after we use it.
Cache will still be used if the Classifier is invoked.
  • Loading branch information
jhawthorn authored and lildude committed May 31, 2019
1 parent d95bae7 commit 071e04a
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 7 deletions.
5 changes: 3 additions & 2 deletions lib/linguist/language.rb
Original file line number Diff line number Diff line change
Expand Up @@ -505,8 +505,9 @@ def inspect
end
end

extensions = Samples.cache['extnames']
interpreters = Samples.cache['interpreters']
samples = Samples.load_samples
extensions = samples['extnames']
interpreters = samples['interpreters']
popular = YAML.load_file(File.expand_path("../popular.yml", __FILE__))

languages_yml = File.expand_path("../languages.yml", __FILE__)
Expand Down
13 changes: 8 additions & 5 deletions lib/linguist/samples.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,15 @@ module Samples
# Path for serialized samples db
PATH = File.expand_path('../samples.json', __FILE__)

# Hash of serialized samples object
# Hash of serialized samples object, cached in memory
def self.cache
@cache ||= begin
serializer = defined?(Yajl) ? Yajl : YAML
serializer.load(File.read(PATH, encoding: 'utf-8'))
end
@cache ||= load_samples
end

# Hash of serialized samples object, uncached
def self.load_samples
serializer = defined?(Yajl) ? Yajl : YAML
serializer.load(File.read(PATH, encoding: 'utf-8'))
end

# Public: Iterate over each sample.
Expand Down

0 comments on commit 071e04a

Please sign in to comment.