Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aws::Xml::Parser::ParsingError when listing objects #3111

Closed
1 task
krystof-k opened this issue Sep 23, 2024 · 12 comments
Closed
1 task

Aws::Xml::Parser::ParsingError when listing objects #3111

krystof-k opened this issue Sep 23, 2024 · 12 comments
Assignees
Labels
guidance Question that needs advice or information. third-party This issue is related to third-party libraries or applications.

Comments

@krystof-k
Copy link

Describe the bug

I get Aws::Xml::Parser::ParsingError error when listing objects using #list_objects_v2.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

It works.

Current Behavior

It throws the error:

response = s3_client.list_objects_v2(
  bucket:,
  prefix:,
  continuation_token:
)
/app/vendor/bundle/ruby/3.2.0/gems/aws-sdk-core-3.203.0/lib/aws-sdk-core/xml/parser/stack.rb:49:in `error': xmlParseCharRef: invalid xmlChar value 12 (Aws::Xml::Parser::ParsingError)

Reproduction Steps

Unable to figure out how to reproduce. It happens just for some prefixes, could be related to filenames.

Possible Solution

No response

Additional Information/Context

No response

Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version

aws-sdk-s3 1.160.0

Environment details (Version of Ruby, OS environment)

Ruby 3.2.5, Alpine 3.20

@krystof-k krystof-k added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 23, 2024
@alextwoods
Copy link
Contributor

What XML gem are you using? (You can determine which xml "engine" is used at runtime with: Aws::Xml::Parser.engine, it should give something like Aws::Xml::Parser::OxEngine).

Any idea what prefixes might be causing an issue? Do you have any unusual object names in the bucket you're trying to list for? Does this happen everytime (accounting for paging) or are you able to successfully get all pages of the response sometimes?

@krystof-k
Copy link
Author

Wow, that was fast :)

Aws::Xml::Parser.engine says Aws::Xml::Parser::NokogiriEngine.

It happens every time, but only for a certain page probably. Unfortunately I have no idea how to retrieve any further information, all I have is the continuation token.

And yes, it is possible there is some unusual character in the object name, it is happening for at least two files (unfortunately I don't know which ones).

@alextwoods
Copy link
Contributor

I'm trying to reproduce this, but haven't been able to figure out what characters might be responsible yet. A few things that might help:

  1. You can enable wire logs with the http_wire_trace option, eg: s3_client = Aws::S3::Client.new(http_wire_trace: true). That will log all of the raw xml response and should give us information about which keys are actually failing.
  2. You can switch which xml engine you are using by installing another gem (eg include gem 'ox' in your bundlefile). (The order of preference they are selected by the SDK in is ox, oga, libxml, nokogiri and then rexml). Different xml parsers handle some characters differently.

@alextwoods alextwoods self-assigned this Sep 23, 2024
@alextwoods alextwoods added the investigating Issue is being investigated label Sep 23, 2024
@krystof-k
Copy link
Author

Awesome, thanks. Now I can see that it is probably because of a corrupted filename. This would be the problematic object:

<Contents>
  <Key>
    room-tyj62m1bnw14/storage/files/2022/07/16/C1WECz5UQ8Xw82vzd7yZj8/\xEF\xBF\xBD&#x7;\xEF\xBF\xBDo&#xb;\xEF\xBF\xBDu\xEF\xBF\xBD]onf\xEF\xBF\xBDI\xDC\x99\xD6\xB3\xEF\xBF\xBD\xEF\xBF\xBD\xEF\xBF\xBDN.r\n\xEF\xBF\xBD</Key>
  <LastModified>2024-09-10T09:00:16.000Z</LastModified>
  <ETag>&quot;be2142edba71363ab371527b7865056c&quot;</ETag>
  <Size>86623</Size>
  <StorageClass>STANDARD</StorageClass>
</Contents>

However I can download the file (with the corrupted filename) from S3 console with no issue. So I'm not sure what is the right approach to this.

@alextwoods
Copy link
Contributor

Can you try using a different XML gem? I believe there are differences in what the underlying xml parsers consider valid

@alextwoods
Copy link
Contributor

May be related to #3081

@krystof-k
Copy link
Author

Well, that will be a little bit complicated but I'll try.

@mullermp
Copy link
Contributor

What specifically is complicated? You should just be able to add the 'ox' gem to your gemfile and the SDK will prefer it.

@krystof-k
Copy link
Author

@mullermp Our production deployment pipeline :)

@alexwoods so I tried with Ox and it seems to work just fine, the response (object key) is the same.

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

@krystof-k
Copy link
Author

OK, so does this mean it is a "bug" in Nokogiri and the solution is to use another XML parser?

@mullermp
Copy link
Contributor

I don't know if it's a bug or not but it's in a grey area for sure. Some parsers think it's invalid and others don't. I think nokogiri is just more strict on that specification. This isn't the first time we've seen this. S3 doesn't strictly follow xml rules either. In either case I would maybe consider adding validation to prevent file names having invalid characters.

@alextwoods alextwoods added third-party This issue is related to third-party libraries or applications. guidance Question that needs advice or information. and removed investigating Issue is being investigated needs-triage This issue or PR still needs to be triaged. bug This issue is a bug. labels Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information. third-party This issue is related to third-party libraries or applications.
Projects
None yet
Development

No branches or pull requests

3 participants