Spanish Regex (to share) #37
Replies: 12 comments 8 replies
-
Have updated the file to improve the regex, and added more groups / common subtitle sites. Interested if some Spanish speakers can review, and share additional sources? For example, I don't use any anime sites so won't have anything from there... |
Beta Was this translation helpful? Give feedback.
-
@KBlixt also interested in your feedback |
Beta Was this translation helpful? Give feedback.
-
Wow, cool 👍 I'll definitely take a look at this. No time today but will take a look at it tomorrow for sure 🙂 |
Beta Was this translation helpful? Give feedback.
-
I've taken a peak at the config file. A few suggestions: es_warn2 ends with "W*(por|de|by)?\W*(:|;).." here the (por|de|by) shouldn't be optional, and the (:|;) should be dropped since purge1 handle those. It would be great if you could use en_warn1 from the English profile as a double warning regex in the Spanish profile and remove any words there that could reasonably be used as a Spanish word. I.e the Swedish profile have a copy of en_warn1 as sv_warn6 and sv_warn7 in order to punish English words that are associated with subtitles. Although any legitimate Swedish word have been stripped from these regexes. I hope you see what I mean. |
Beta Was this translation helpful? Give feedback.
-
Thanks @KBlixt , completely agree with your observations, and have made those changes, and sorry, I am not an expert with regex haha. I am interested if you have any recommendations on how to handle these types of blocks? Or is best to leave them as warnings do you think? Currently using the following to identify them as warnings es_warn4: Spanish -
|
Beta Was this translation helpful? Give feedback.
-
Just wanted to say thanks to @frasderp for making this. I've been using it since you posted it and it seems to work great. I even went and ran it on all of my old ES SRT files to clean those up too. I watched a lot of them go by in the console and I didn't see any obvious problems. Anyway, I appreciate you sharing this with us! |
Beta Was this translation helpful? Give feedback.
-
Hi @albino1 you're welcome! I would be interested if you could share your log file with me if possible? I would like to review the removed blocks, and the warnings, to see if the script can be improved further also. @KBlixt I have been busy with work, but made some changes to those blocks to this, to catch a few of the variations. es_warn4: spanish ?(-|]|/) |
Beta Was this translation helpful? Give feedback.
-
The logs are really long, but I can send them to you if you want. Here's a ton of examples of ones that it warned for and didn't remove: https://gist.github.com/albino1/f990c4d10f5ab6dc6a3104966b38b366 There's plenty more, but it would be easier to go through and pull out if there was a way to have subcleaner ignore duplicate lines and ignore the thing where it flags episode numbers and episode titles. 9/10 warnings are for duplicate lines, so it's just a lot to sift through :) |
Beta Was this translation helpful? Give feedback.
-
Thanks @albino1 this is a great help. There are definitely some things in there I need to add (some specific websites etc) that I don't have a source. |
Beta Was this translation helpful? Give feedback.
-
@KBlixt I have added an additional purge line to be more aggressive on "Subtitles by" / "Translated by" type lines, what do you think?
Also I wanted to understand, on the first purge line, why do you use .. the two spaces at the end? I think the regex would work better without those in my testing?
(I think it works better to remove the two.. on the last one) |
Beta Was this translation helpful? Give feedback.
-
@KBlixt , uploading a tidied version (similar to the portuguese structure), with a lot more subtitle groups / translation groups to purge. |
Beta Was this translation helpful? Give feedback.
-
@frasderp the new version works great, it removed a ton of stuff that got skipped before. I went through and found a bunch more after this latest run in case you were interested in updating it further: https://gist.github.com/albino1/6d77abd5d854480d962ba2f3a802d948 If not, no problem, it's already really good. If it's helpful, I can also send you any individual subtitle file for one of the ones listed above, or at least link you to them on whatever website Bazarr downloaded them from. |
Beta Was this translation helpful? Give feedback.
-
Hey there, I have developed a Spanish regex profile, I would like to share.
I translated the english one, and then added the most common Spanish sub sites / groups as well as common words they use. I have run it a few times on my library and finetuned it etc.
Would like to share, and improve / get some feedback to make it even more effective. Have attached the .conf file in a zip.
I am also proposing to move some of the warnings I currently have to purge blocks, to remove these style of blocks, but interested if that would be a problem or not...
spanishv2.zip
Beta Was this translation helpful? Give feedback.
All reactions