Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dictionary contribution: 擬音語・擬態語辞典 #83

Open
dubai03nsr opened this issue Feb 20, 2025 · 9 comments
Open

Dictionary contribution: 擬音語・擬態語辞典 #83

dubai03nsr opened this issue Feb 20, 2025 · 9 comments

Comments

@dubai03nsr
Copy link

dubai03nsr commented Feb 20, 2025

I'm contributing the dictionary 擬音語・擬態語辞典 (Amazon JP). The dictionary contains 1967 entries of mimetic words (aka onomatopoeia).

擬音語・擬態語辞典.zip
擬音語・擬態語辞典.zip (without structured content)

This lightweight dictionary specializes on mimetic words, offering a depth beyond that of normal dictionaries. The example sentences tend to be more fleshed out and from relatively modern texts. The entries include extensive comparison with related words.

Examples

いけしゃーしゃー:
Image

いけしゃーしゃー (original):
Image

よろよろ:
Image

Format

Each definition is followed by an example sentence (except in exceptional cases). The definitions are numbered if there are multiple. The optional field 類義語 explains the relationship to related words, with links to their entries when available. The optional field "➜" points to entries that refer to the current one in their 類義語 field. The optional field 参考 provides additional information.

Creation of the dictionary

As the dictionary is not available in digital form, I manually transcribed it with the help of OCR. I read the physical dictionary alongside the OCR results, correcting any mistakes. Then I proofread the entire Yomitan dictionary and corrected any remaining mistakes I found. At this point I believe there to be almost no mistakes, especially ones that would meaningfully impact your usage of the dictionary. If you suspect there is a mistake (and don't have the original dictionary), I'd be glad to check for you. I should mention that the example sentences come from texts that may be older than what you're used to reading, and contain unusual okurigana, spelling, etc. that I have intentionally preserved when I believe they minimally affect comprehensibility.

Below are some notes on my procedure for converting the original dictionary entries into Yomitan entries. I omitted historical usage, dialect usage, information I deemed too trivial or obscure, and citations of example sentences. I sometimes replaced words with their more common spellings. I sometimes simplified the example sentence without affecting its usefulness as an example sentence. A few of the example sentences are from massif or weblio, in cases where I couldn't make sense of the dictionary's example sentence. If there were multiple good example sentences, I put the rest with the definition. There is generally no pitch accent information, but sometimes the dictionary decides to mention some distinction, in which case I mention it.

@rampaa
Copy link

rampaa commented Mar 21, 2025

@dubai03nsr Thanks for creating this! If you don't mind, I have a few questions to ask:

  1. What's the main difference between the JL version and the Yomitan version? I am guessing the content itself is the same, and it only differs in styling, but I'd like to confirm it just in case.

  2. A quick look suggests that the Yomitan version looks better on JL compared to the JL version (at least to me). See the image below:
    Image

Do you have any entries in mind where JL messes up the styling badly when using the Yomitan version? I can try to change how JL is parsing the structured content to make those entries better if it's doable (unfortunately, this is quite hard, so I can't really promise I can actually make it better, see rampaa/JL#97 for more information).

Also, an unrelated suggestion: If you care about apps that don't use a WebViewer like JL, creating a dictionary without structured content is a better idea than creating a curated version with structured content for those apps. Those apps may change how they convert structured content to plain text, which may regress how the entries of your dictionary are being displayed inadvertently. So something like:

    [
        "あーん",
        "",
        "",
        "",
        0,
        [
            "子供などが声を張り上げて泣く声。甘えがある。\n「犬に噛まれたと言って、子供のように『ああん、ああん』と泣きながら」",
            "口を大きく開ける様子。歯の治療時などに医者に言われる。\n「さあ、口をあーんと開いて」\n類義語「わーん」「あんあん」\n共に、①の類義語。「わーん」には甘えが少ない。「あんあん」は声の張り上げ方が弱い。"
        ],
        0,
        ""
    ],

or

    [
        "あーん",
        "",
        "",
        "",
        0,
        [
            "①子供などが声を張り上げて泣く声。甘えがある。\n「犬に噛まれたと言って、子供のように『ああん、ああん』と泣きながら」\n②口を大きく開ける様子。歯の治療時などに医者に言われる。\n「さあ、口をあーんと開いて」\n類義語「わーん」「あんあん」\n共に、①の類義語。「わーん」には甘えが少ない。「あんあん」は声の張り上げ方が弱い。"
        ],
        0,
        ""
    ]

would be more preferable.

The first version would allow JL users to choose whether they want a new line or for a more compact look (through the Manage dictionaries->Edit->Newline between definitions option). The second version would not allow that, but it has some other pros, such as likely being more compatible with a wider range of software, ever so slightly faster lookups in JL, and slightly less memory usage in JL.

Anyway, I don't know what it takes to create a Yomitan dictionary, so if creating a version without using any structured content is somehow bothersome, feel free to ignore what I've suggested. Again, thanks for creating this dictionary!

@dubai03nsr
Copy link
Author

@rampaa Thanks for looking into this!

I realize now that I was using an old version of JL, and with the most recent version, the Yomitan dictionary displays in JL as intended. I will remove the JL version from this post. The issue I observed with the old JL version when using the Yomitan dictionary was that structured content would create new lines.

@rampaa
Copy link

rampaa commented Mar 22, 2025

@dubai03nsr Thank you for creating this! FWIW I still think having a version without any structured content would be quite valuable.

@dubai03nsr
Copy link
Author

@rampaa Sure, I've added it back!

@rampaa
Copy link

rampaa commented Mar 22, 2025

@dubai03nsr Sorry, I've probably expressed myself incorrectly. That version still has structured content and ends up in an undesirable styling with the up-to-date JL version. What I meant is instead of having entries like

    [
        "あーん",
        "",
        "",
        "",
        0,
        [
            {
                "type": "structured-content",
                "content": [
                    {
                        "tag": "span",
                        "content": "①子供などが声を張り上げて泣く声。甘えがある。"
                    },
                    {
                        "tag": "span",
                        "content": "「犬に噛まれたと言って、子供のように『ああん、ああん』と泣きながら」"
                    },
                    {
                        "tag": "span",
                        "content": "②口を大きく開ける様子。歯の治療時などに医者に言われる。"
                    },
                    {
                        "tag": "span",
                        "content": "「さあ、口をあーんと開いて」"
                    },
                    {
                        "tag": "span",
                        "content": "類義語「わーん」「あんあん」"
                    },
                    {
                        "tag": "span",
                        "content": "共に、①の類義語。「わーん」には甘えが少ない。「あんあん」は声の張り上げ方が弱い。"
                    }
                ]
            }
        ],
        0,
        ""
    ]

having entries like

    [
        "あーん",
        "",
        "",
        "",
        0,
        [
            "①子供などが声を張り上げて泣く声。甘えがある。\n「犬に噛まれたと言って、子供のように『ああん、ああん』と泣きながら」\n②口を大きく開ける様子。歯の治療時などに医者に言われる。\n「さあ、口をあーんと開いて」\n類義語「わーん」「あんあん」\n共に、①の類義語。「わーん」には甘えが少ない。「あんあん」は声の張り上げ方が弱い。"
        ],
        0,
        ""
    ]

would be quite valuable. Sorry for the confusion.

@dubai03nsr
Copy link
Author

@rampaa I see, I have updated it now.

@ttsfan
Copy link

ttsfan commented Mar 26, 2025

What an impressive work! Thank you for your dedication! I was wondering if there are plans to create a complete version that encompasses all the content of this dictionary?

@dubai03nsr
Copy link
Author

@ttsfan Thanks for your interest! Unfortunately, I have no plans to create a version with all the contents of the original dictionary. I selectively chose to include content in the Yomitan dictionary / Anki deck based on (1) the labor cost of transferring the contents and (2) the level of concision that I believed to be best for lookups and learning. I refer the interested reader to the original dictionary for its full contents.

@ttsfan
Copy link

ttsfan commented Mar 27, 2025

Got it, thank you for your response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants