Skip to content

imprecise hint #750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mircealungu opened this issue Apr 3, 2025 · 12 comments
Open

imprecise hint #750

mircealungu opened this issue Apr 3, 2025 · 12 comments

Comments

@mircealungu
Copy link
Member

This should say: you need to change one letter rather than add one. Or?

Image Image
@tfnribeiro
Copy link
Contributor

The library doesn't support danish letters (æ -> ae, ø -> oe and å -> aa)

@tfnribeiro
Copy link
Contributor

I looked at another library called leven, which seems to not have this issue:

Image

https://www.npmjs.com/package/leven

So we could migrate to this one instead?

@mircealungu
Copy link
Member Author

yes, let's!

@tfnribeiro
Copy link
Contributor

So turns out it might actually be due to another library we use: import removeAccents from "remove-accents";

The library does transform Æ -> AE and then that results in the problem we have now. I don't know what is best to do here, should it be to remove this but people might then struggle to type à á ã and so on or we try to avoid cases where it's only the ø æ å ?

@mircealungu
Copy link
Member Author

write our own "library" for remove accents?

@mircealungu
Copy link
Member Author

mircealungu commented Apr 9, 2025

e.g. cf. Claude:

Image

Source: https://claude.ai/share/7e1f36bf-fa2c-43a7-bdeb-d07aafc5e019

@tfnribeiro
Copy link
Contributor

Seems fine on a first look - we could give it a try.

@tfnribeiro
Copy link
Contributor

tfnribeiro commented Apr 10, 2025

So I tested that and it doesn't really work for Å:

Image

Image

@tfnribeiro
Copy link
Contributor

I asked DeepSeek to see what it comes up with and this was the solution:

function removeAccents(str) {
  // Temporarily replace these with unique placeholders
  let placeholderCounter = 0;
  const placeholderMap = {};
  let tempStr = str.replace(/[ÅåÆæØø]/g, (match) => {
    const placeholder = `__SCANDI_${placeholderCounter++}__`;
    placeholderMap[placeholder] = match;
    return placeholder;
  });
  // Now normalize and remove diacritics
  tempStr = tempStr.normalize("NFD").replace(/[\u0300-\u036f]/g, "");

  // Restore the Scandinavian characters
  return tempStr.replace(
    /__SCANDI_(\d+)__/g,
    (_, num) => placeholderMap[`__SCANDI_${num}__`],
  );
}

I think this could be slightly simplified since the only character that doesn't work with normalization is the Åå. I guess because it is seen as a Áá essentially.

@tfnribeiro
Copy link
Contributor

Sidenote, the function:

function removeQuotes(x) {
  return x.replace(/[^a-zA-Z ]/g, "");
}

does more than just remove quotes. It also removes "-" for example. Is this the intended behavior? For example, umbrella in Portuguese is "guarda-chuva", so removing the dash here would make it technically incorrect same with "It's" and "Its", they are different words.

@mircealungu
Copy link
Member Author

guarda-chuva is a good name :)

@tfnribeiro
Copy link
Contributor

guarda-chuva is a good name :)

Literally meaning: "Guard Rain" 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants