Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spanish word list #99

Merged
merged 3 commits into from
Dec 18, 2023
Merged

Add spanish word list #99

merged 3 commits into from
Dec 18, 2023

Conversation

jawlenskys
Copy link
Contributor

Add a 8192 words-long Spanish wordlist. The wordlist was generated from two different sources of words:

The final Spanish wordlist has been generated as follows:

* Start from /usr/share/dict/spanish and filter out:
    - words not matching /^[a-z]+$/
    - words shorter than 4 characters
    - words longer than 8 characters
* Do the same with the CREA word list.
* Compare both lists and extract the words that appear in both lists.
* Remove all words that are a suffix of any other word in the list.
* Remove some offending and non-Spanish words.
* Change some of the hard-to-remember words and replace them
  with some easy-to-remember words from CREA's list.

@ulif ulif merged commit 956f2b1 into ulif:master Dec 18, 2023
1 check passed
@ulif
Copy link
Owner

ulif commented Dec 18, 2023

Hey Victor,
Impressive work! Thanks a lot!

/ulif

@jawlenskys jawlenskys deleted the wordlist-es branch October 4, 2024 13:51
@jawlenskys
Copy link
Contributor Author

Hi @ulif,
I've only just realised that I didn't specify the licence of the word lists I provided (es, ca and it). I'm fine with the CC-BY-3.0.

These three word lists were also provided as part of Tails. Should I create a PR to add this information?

@ulif
Copy link
Owner

ulif commented Nov 28, 2024

Hi Victor,

Could you also imagine to apply CC BY 4.0 to your lists? It seems to be a bit easier to adapt for licensees (no need anymore to tell the title of a work, for instance) and looks more mature than the previous CC licenses. I will switch at least my own lists to CC BY 4.0. (Differences)

Awesome, that your lists are also part of Tails. This info should be given also in the diceware docs. I love Tails, support it also financially, and try to spread it whenever possible :)

I am planning for a (major) release around chrismas, just waiting for a longer French wordlist from Tango. Therefore I would be happy to incorporate your copyrights (with a remark about Tails) in diceware, but also wouldn't refuse a PR if you have the time :)

@jawlenskys
Copy link
Contributor Author

Could you also imagine to apply CC BY 4.0 to your lists?

Yes, no problem.

I am planning for a (major) release around chrismas, just waiting for a longer French wordlist from Tango. Therefore I would be happy to incorporate your copyrights (with a remark about Tails) in diceware, but also wouldn't refuse a PR if you have the time :)

That sounds great! Done in #113.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants