How to decode POST requests with different charset #2196
-
I have an old application I'm making a back-end server for. This application sends data in not-URL-encoded Shift-JIS, and whenever there are CJK characters in that data, Rocket skips the request:
I thought adding An example: #[derive(rocket::FromForm)]
struct GreetingData {
name: String
}
#[rocket::post("/", data = "<data>")]
fn greet_route(data: rocket::form::Form<GreetingData>) -> String {
format!("Hello, {}", data.name)
}
#[rocket::launch]
fn rocket() -> _ {
rocket::build().mount("/", rocket::routes![greet_route,])
} Make two files with different encodings, Result: A bit different but related problem: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
To make a long story short: I don't expect Rocket to ever support this, since it's not standards conforming. https://url.spec.whatwg.org/#application/x-www-form-urlencoded has details on the content type, but the tl;dr is x-www-form-urlencoded doesn't actually have a charset argument, and is always utf-8. It does note that servers supporting legacy applications may have to support other charsets, but I don't expect this to be part of Rocket's form parser. One solution is to write a custom form parser that accepts Shift-JIS, but this is time consuming and error prone. Alternatively, you could write a wrapper that first converts the form data to UTF-8, and then uses Rocket's form parser to parse the UTF-8. I've gone ahead and written an implementation, which seems to fix the issue as outlined above: use encoding::Encoding;
use rocket::{
data::{Data, FromData, ToByteUnit},
form::{self, Error, Form, FromForm},
http::Status,
request::{Request, local_cache},
response::Responder,
outcome::{try_outcome, IntoOutcome},
};
pub struct CharsetAwareForm<T>(T);
#[rocket::async_trait]
impl<'r, T> FromData<'r> for CharsetAwareForm<T>
where
T: FromForm<'r>,
{
type Error = <Form<T> as FromData<'r>>::Error;
async fn from_data(
req: &'r Request<'_>,
data: Data<'r>,
) -> rocket::data::Outcome<'r, Self, Self::Error> {
if req.format().map_or(false, |format| {
format.is_form()
&& format
.params()
.any(|(k, v)| k == "charset" && v == "Shift-JIS")
}) {
let by = try_outcome!(data
.open(
req.rocket()
.config()
.limits
.get("form")
.unwrap_or(2usize.mebibytes())
)
.into_bytes()
.await
.map_err(|e| {
let mut errors = rocket::form::Errors::new();
println!("E: {:?}", e);
errors.push(Error {
name: None,
value: None,
kind: form::prelude::ErrorKind::Io(e),
entity: form::prelude::Entity::Form,
});
errors
})
.into_outcome(Status::BadRequest));
let s = local_cache!(
req,
encoding::all::WINDOWS_31J
.decode(&by, encoding::DecoderTrap::Replace)
.unwrap()
);
Form::parse(s).map(|t| Self(t)).into_outcome(Status::UnprocessableEntity)
} else {
<Form<T> as FromData<'r>>::from_data(req, data)
.await
.map(|f| Self(f.into_inner()))
}
}
}
impl<T> std::ops::Deref for CharsetAwareForm<T> {
type Target = T;
fn deref(&self) -> &Self::Target {
&self.0
}
} It uses the encoding crate (which doesn't have a specific This doesn't handle percent encoded characters at all, but it could be added. |
Beta Was this translation helpful? Give feedback.
To make a long story short: I don't expect Rocket to ever support this, since it's not standards conforming. https://url.spec.whatwg.org/#application/x-www-form-urlencoded has details on the content type, but the tl;dr is x-www-form-urlencoded doesn't actually have a charset argument, and is always utf-8. It does note that servers supporting legacy applications may have to support other charsets, but I don't expect this to be part of Rocket's form parser.
One solution is to write a custom form parser that accepts Shift-JIS, but this is time consuming and error prone. Alternatively, you could write a wrapper that first converts the form data to UTF-8, and then uses Rocket's form parser to …