Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better implementation (memory-wise) #1

Open
yawnoc opened this issue Jan 26, 2022 · 1 comment
Open

Better implementation (memory-wise) #1

yawnoc opened this issue Jan 26, 2022 · 1 comment
Labels
backburner Some day

Comments

@yawnoc
Copy link
Member

yawnoc commented Jan 26, 2022

I don't know very much about programming, so I have produced a rather crappy implementation of the stroke input method.

In particular, I load all the stroke data into memory when the input method service is instantiated:

@Override
public void onCreate()
{
super.onCreate();
loadSequenceCharactersDataIntoMap(SEQUENCE_CHARACTERS_FILE_NAME, charactersFromStrokeDigitSequence);
loadCharactersIntoCodePointSet(CHARACTERS_FILE_NAME_TRADITIONAL, codePointSetTraditional);
loadCharactersIntoCodePointSet(CHARACTERS_FILE_NAME_SIMPLIFIED, codePointSetSimplified);
loadRankingData(RANKING_FILE_NAME_TRADITIONAL, sortingRankFromCodePointTraditional, commonCodePointSetTraditional);
loadRankingData(RANKING_FILE_NAME_SIMPLIFIED, sortingRankFromCodePointSimplified, commonCodePointSetSimplified);
loadPhrasesIntoSet(PHRASES_FILE_NAME_TRADITIONAL, phraseSetTraditional);
loadPhrasesIntoSet(PHRASES_FILE_NAME_SIMPLIFIED, phraseSetSimplified);
updateCandidateOrderPreference();
}

It works, but the downsides are:

  • It takes a long time to load on lower-end devices (e.g. takes 1.2 seconds on my cheap phone with ~1.3 GB RAM)
  • It requires a lot of memory

  1. Most of the time is spent during loadSequenceCharactersDataIntoMap. Is there a better way of reading a TSV than what I currently have?

    @SuppressWarnings("SameParameterValue")
    private void loadSequenceCharactersDataIntoMap(
    final String sequenceCharactersFileName,
    final Map<String, String> charactersFromStrokeDigitSequence
    )
    {
    final long startMilliseconds = System.currentTimeMillis();
    try
    {
    final InputStream inputStream = getAssets().open(sequenceCharactersFileName);
    final BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
    String line;
    while ((line = bufferedReader.readLine()) != null)
    {
    if (!isCommentLine(line))
    {
    final String[] sunderedLineArray = Stringy.sunder(line, "\t");
    final String strokeDigitSequence = sunderedLineArray[0];
    final String characters = sunderedLineArray[1];
    charactersFromStrokeDigitSequence.put(strokeDigitSequence, characters);
    }
    }
    }
    catch (IOException exception)
    {
    exception.printStackTrace();
    }
    final long endMilliseconds = System.currentTimeMillis();
    sendLoadingTimeLog(sequenceCharactersFileName, startMilliseconds, endMilliseconds);
    }

  2. Alternatively, given that the stroke data is a constant map, is it possible to bake it into the class so that I don't need to load it every time? Or, can we do something completely different that isn't so memory intensive?

@yawnoc
Copy link
Member Author

yawnoc commented Feb 4, 2022

According to tadfisher on Reddit:

To avoid loading the whole dataset, I would suggest that you ship it as a sqlite database. That way you can add index columns based on prefix characters and let sqlite filter candidates for you. A script to build a sqlite3 database and populate it with the Conway dataset should be a reasonable project for a beginner.

@yawnoc yawnoc added backburner Some day and removed help wanted labels Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backburner Some day
Projects
None yet
Development

No branches or pull requests

1 participant