Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix how Entry#to_h represents Po entries with multiple lines #28

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rod-murphy
Copy link

@rod-murphy rod-murphy commented Jun 8, 2021

As described in #27 multi-line po entries did not export in such a was as they could be successfully reimported when Entry#to_h was called on each Entry object.

The same thing was happening to multi-line comment entries.

I've changed Entry so that it accepts plural form msgstr entries as msgstr_n instance variable and writer methods, but exports these (via to_h) using original msgstr[n] notations.

Fixes #27

@rod-murphy rod-murphy marked this pull request as draft June 8, 2021 17:43
@arashm
Copy link
Owner

arashm commented Jun 10, 2021

Sorry for late response, I was traveling. I'll have a look this afternoon.

@rod-murphy
Copy link
Author

@arashm it's still WIP - the specs are passing but when I try it in my project, it's not working as expected. Multiline comments are being output incorrectly by to_s (only the first line starts with a # with subsequent lines omitting the #).

Trying to get my head around it, but I think fundamentally the way to_h exports the Po object wrt to plural and multiline msgstr elements may be ambiguous.

I'm hoping to go back to it tomorrow or early next week, I'll try to reproduce that scenario in a spec.

@rod-murphy rod-murphy marked this pull request as ready for review June 15, 2021 09:00
@rod-murphy
Copy link
Author

@arashm this should we resolved now. Let me know if you spot any issues

@rod-murphy
Copy link
Author

@arashm sorry to bug you again. Did you get a chance to look at this. I'd love to just bump the version I have rather than point our app at my fork.

@dfherr
Copy link
Collaborator

dfherr commented Jun 21, 2021

I've changed Entry so that it accepts plural form msgstr entries as msgstr_n instance variable and writer methods, but exports these (via to_h) using original msgstr[n] notations.

@rod-murphy could you elaborate why this is necessary? That looks like an unnecessary hack to me.

@dfherr
Copy link
Collaborator

dfherr commented Jun 21, 2021

To add to my previous question. Using this test file:

# small test po 1
msgid ""
msgstr ""
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"


msgid "Test"
msgstr "Test"

msgid ""
"Hello\n"
"there!\n"
msgstr ""
"Yay!\n"
"i like"

msgid "car"
msgid_plural "cars"
msgstr[0] "singular"
msgstr[1] "plural"

Parsing it and using po.entries[2].msgstr[1] returns plural as expected. Why does that not work for you?

@rod-murphy
Copy link
Author

Thanks for the reply @dfherr.

could you elaborate why this is necessary? That looks like an unnecessary hack to me.

As I said above, it's to address the issue where the array produced by Entry#to_h cannot be used to as an input to Entry.add when it came to multi-line messages.

In my mind, if I can export an Po object as and array of hashes and must import entries by passing a hash (or array of hashes), then this should be reversible?

You can see this when you run entry_spec.rb in this PR against lib/poparser/entry.rb in master:

1) PoParser::Entry should show a hash presentation of a plural string entry
     Failure/Error: expect(@entry.to_h).to eq(result)
     
       expected: {:msgid_plural=>"right word", :"msgstr[0]"=>["mot", "juste"], :"msgstr[1]"=>["mots", "justes"], :translator_comment=>["comment", "second line of comments"]}
            got: {:msgid_plural=>"right word", :msgstr_0=>"motjuste", :msgstr_1=>"motsjustes", :translator_comment=>"comment\nsecond line of comments"}
     
       (compared using ==)
     
       Diff:
       
       @@ -1,5 +1,5 @@
       -:"msgstr[0]" => ["mot", "juste"],
       -:"msgstr[1]" => ["mots", "justes"],
        :msgid_plural => "right word",
       -:translator_comment => ["comment", "second line of comments"],
       +:msgstr_0 => "motjuste",
       +:msgstr_1 => "motsjustes",
       +:translator_comment => "comment\nsecond line of comments",
       
     # ./spec/poparser/entry_spec.rb:51:in `block (2 levels) in <top (required)>'

  2) PoParser::Entry should show a hash presentation of an entry
     Failure/Error: expect(@entry.to_h).to eq(result)
     
       expected: {:msgid=>"string", :msgstr=>"reshte", :translator_comment=>["comment", "second line of comments"]}
            got: {:msgid=>"string", :msgstr=>"reshte", :translator_comment=>"comment\nsecond line of comments"}
     
       (compared using ==)
     
       Diff:
       @@ -1,4 +1,4 @@
        :msgid => "string",
        :msgstr => "reshte",
       -:translator_comment => ["comment", "second line of comments"],
       +:translator_comment => "comment\nsecond line of comments",
       
     # ./spec/poparser/entry_spec.rb:35:in `block (2 levels) in <top (required)>'

The resulting hashes are not correctly added to a Po object when PoParser::Po.add is called

@dfherr
Copy link
Collaborator

dfherr commented Jun 21, 2021

@rod-murphy that doesn't really clear up why msgstr_0 and msgstr_1 syntax is necessary? What's wrong with msgstr as an array and accessing with indicies? It seems to me I'm missing a piece of the puzzle to understand why you are adding these, because entry.msgstr properly returns an array and entry.msgstr_0 is a NoMethodError on master (as it should be).

I agree that it makes sense that these are reversible, but why adding another syntax to accessing plural msgstr?

@arashm
Copy link
Owner

arashm commented Jun 21, 2021

I was checking your original issue #27, and as I remember the issue was not being able to parse the header, because the Header expects an array but to_h returns it as a string. I was checking and it seems adding following code to the beginning of Header#convert_msgstr_to_hash(msgstr) would solve your issue:

    def convert_msgstr_to_hash(msgstr)
      value = msgstr.value.is_a?(String) ? msgstr.value.split("\\n") : msgstr.value

      options_array = value.map do |options|
      ...

Wouldn't this be sufficient for your case?

@rod-murphy
Copy link
Author

@rod-murphy that doesn't really clear up why msgstr_0 and msgstr_1 syntax is necessary? What's wrong with msgstr as an array and accessing with indicies? It seems to me I'm missing a piece of the puzzle to understand why you are adding these, because entry.msgstr properly returns an array and entry.msgstr_0 is a NoMethodError on master (as it should be).

The reason this is necessary is because of how writer methods are defined. i.e. it's not possible to have a method named msgstr[0] etc. in the Entry class. We could drop these writer methods and make the code a little simpler.

To better cover all case and make is a little clearer, I've updated the Entry spec to make things a bit clearer with the following test:

  it 'should show a hash representation of a complex entry' do
    # Ensure the to_h method is reversible
    # From SimplePoParser::Parser - https://github.com/experteer/simple_po_parser/blob/v1.1.5/spec/simple_po_parser/parser_spec.rb#L31
    entry_hash = {
      :translator_comment => ["translator-comment", ""],
      :extracted_comment => "extract",
      :reference => ["reference1", "reference2"],
      :flag => "flag",
      :previous_msgctxt => "previous context",
      :previous_msgid => ["", "multiline\\n", "previous messageid"],
      :previous_msgid_plural => "previous msgid_plural",
      :msgctxt => "Context",
      :msgid => "msgid",
      :msgid_plural => ["", "multiline msgid_plural\\n", ""],
      "msgstr[0]" => "msgstr 0",
      "msgstr[1]" => ["", "msgstr 1 multiline 1\\n", "msgstr 1 line 2\\n"],
      "msgstr[2]" => "msgstr 2"
    }

    entry = PoParser::Entry.new(entry_hash)
    expect(entry.to_h).to eq(entry_hash)
  end

Essentially, for Entry#to_h (and thus Po#to_h) to be reversible, it needs to output the same format that SimplePoParser::Parser outputs as this is what gets used when the Po file/string is parsed

@arashm your solution doesn't go far enough. If you run the above spec, you can see that in addition to header entries, strings and multiline forms of msgstr[n] are not exported correctly.

@arashm
Copy link
Owner

arashm commented Jun 24, 2021

Sorry again for the delay, I will hopefully look into it today after work.

@arashm
Copy link
Owner

arashm commented Jun 24, 2021

Your suggestion seems to change the behavior of to_h, hence breaking the public API. I suggest we do something like this. Can you have a look and let me know what do you think?

@rod-murphy
Copy link
Author

Thanks @arashm and again sorry for the delay coming back to you.

Generally your fix will work for the most part (I've added a few comments to your commit), however, when run against's the test hash from SimplePoParser::Parser specs (in my comment above), it's not reversible when it comes to some of the comment fields.

Can you help me understand where I'm changing the behaviour of to_h there are no specs or docs to suggest my suggestion will break things for other users? I agree that I'm suggesting a change to the public API in terms of the assessor methods for plural message strings as we can't otherwise distinguish between multi-line and plural message strings using a msgstr= method.

@arashm
Copy link
Owner

arashm commented Jul 24, 2021

It's kinda weird that every message in this topic start with an excuse for the delay on responses, but anyway, I'm sorry 😅 I (again) totally forgot about this for a while.

Can you help me understand where I'm changing the behaviour of to_h there are no specs or docs to suggest my suggestion will break things for other users?

The behaviour change is exactly about your comment here. The current version of to_h is returning multiline msgstr in as a string, but you now have changed it to return an array. In the current version if I do something like this:

[1] pry(main)> load 'lib/poparser.rb'
=> true
[2] pry(main)> po = PoParser.parse_file('test/complex_entry.po')
=> <PoParser::Po, Translated: 1(100.0%) Untranslated: 0(0.0%) Fuzzy: 0(0.0%)>
[3] pry(main)> po.to_h
=> [{:translator_comment=>"translator-comment\n",
  :extracted_comment=>"extract",
  :reference=>"reference1\nreference2",
  :flag=>"flag",
  :previous_msgctxt=>"previous context",
  :previous_msgid=>"multiline\\nprevious messageid",
  :previous_msgid_plural=>"previous msgid_plural",
  :msgctxt=>"Context",
  :msgid=>"msgid",
  :msgid_plural=>"multiline msgid_plural\\n",
  "msgstr[0]"=>"msgstr 0",
  "msgstr[1]"=>"msgstr 1 multiline 1\\nmsgstr 1 line 2\\n",
  "msgstr[2]"=>"msgstr 2"}]

You can see msgstr[1] is returning a string with line breaks. That's what I'm trying to preserve, since it's been like this, some people may have their code written to expect a string there. Maybe a bad decision since the beginning but we gonna need a major breaking release if we want to change that to an array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Creating new Po from existing Po raises NoMethodError
3 participants