-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only download pages matching the user's languages #345
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is pretty nice. However, I'm not yet convinced about the usability tradeoffs of this feature.
Advantages of Per-Language Downloads
First of all, the advantages: I think the only advantage is that we don't download all languages at once and thus speed up the update in case of a slow internet connection.
Right now the complete tldr.zip
download is 7.7 MiB in size. The English pages only are 1.8 MiB. German pages are 259 KiB.
(Edit: I looked at the original issue, and disk space was mentioned an issue as well. Right now the pages use 76 MiB. 17 of those could be saved by deduplicating the now duplicated pages
and pages.en
directories. However, both 59 or 76 MiB seem like a non-issue on today's hardware, where 1 TiB of storage costs you around 40$. People that want to save disk space should ideally pick a tldr client that fetches the pages on-demand, without a cache. Or a wrapper script could be used that removes unneeded languages after a tealdeer update.)
Disadvantages of Per-Language Downloads
I gave it a quick try, and noticed a few issues:
-
Messages in case of a cache miss are not clear. If I run
tldr --update
, only the English pages are downloaded. If I now runtldr -L fr tar
, I get this error message:Warning: Page `tar` not found in cache. Try updating with `tldr --update`, or submit a pull request to: https://github.com/tldr-pages/tldr
...however, running
tldr --update
will not help, it will simply re-download the English pages. Tealdeer has no way of knowing whether a French version of that page exists (it does), so UX will always be non-ideal.One workaround would be to suggest running
tldr -L <current-language> --update
if the language isn't the default language (according to the config, with fallback to English). Adds complexity though. -
The feature has a very ugly interaction with auto-update. If I run
tldr -L de tar
with a stale cache, the old cache will be nuked and replaced with a German cache. Next time I runtldr git
with the default language set to English, it will complain that the page wasn't found in the cache. However, because the cache isn't stale (it was just fetched), it will not be auto-updated either. -
Depending on how many languages the user has configured, given a fast internet connection, cache updates may be slower than before (because we sequentially download and unzip the language archives).
-
Finally, the handling of all those archives, in combination with the update logic, adds considerable complexity to the codebase. I'm fairly sure there are other UX issues that we may have missed.
All in all, this seems a nice feature at first, but I'm not sure the (in my opinon tiny) advantages outweigh the disadvantages. What's your opinion, @niklasmohrin
@@ -61,7 +61,7 @@ const APP_INFO: AppInfo = AppInfo { | |||
name: NAME, | |||
author: NAME, | |||
}; | |||
const ARCHIVE_URL: &str = "https://tldr.sh/assets/tldr.zip"; | |||
const ARCHIVES_URL: &str = "https://tldr.sh/assets/"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be called ARCHIVE_URL_PREFIX
instead? (If yes, the cache update
method parameter needs a rename as well.)
@@ -134,7 +134,7 @@ impl Cache { | |||
} | |||
|
|||
/// Download the archive from the specified URL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe document the return type?
/// Download the archive from the specified URL. | |
/// Download the archive from the specified URL. | |
/// Return `None` if the URL results in a HTTP 404. |
The main motivation for me is honestly that it just feels like the right thing to do. Downloading ~40 languages when most people only need one or two is excessive and extra work which is not in the spirit of tealdeer as I perceive it. I have to admit, this is a tough argument to bring, given that I previously argued against None of this solves the problems you pointed out though, so here is what I think about each of those:
So all in all, I agree that there are problems, but I am still convinced that language specific downloads are the way to go and that I would have implemented it like this, if I were to write tealdeer from scratch today. What do you think? |
Such a message would need to be conditional on whether or not a custom language is enabled in the settings, or via CLI parameters. Otherwise, it might be quite confusing, if I search for a page that doesn't exist in the tldr repository, and get the message that I need to "install some language".
That might be possible, but I think that would add a lot of additional complexity in the codebase that we don't need right now. It's harder to contribute to the code without fully understanding related concepts, and bugs are more likely to happen in certain uncommon combinations / configurations. All in all, I still think we're better off in the long run if we don't add support for this.
I like to develop a client implementation that is fast, works for the majority of use cases, has a clean codebase and is easy to maintain. I fear that if we start adding more complexity to cover rare use cases, then the codebase becomes unnecessarily complex and the project will start to feel like a day job. And at that point, I'd lose the motivation to keep working on it 🙂 @niklasmohrin would it be OK for you if we closed this PR? |
I don't agree. I think the complexity related to languages is mostly already there, we already have to parse language strings, find the correct directory name for a given language and respect the order. I agree that adding all needed changes to support the features from the issue and from what we discussed here will add complexity, but I think this is just incidental complexity that occurs because we are adding this after the fact. I think that if we sat down and maybe refactored the I suppose we could close the PR if you are unhappy with the trend the code is taking here, but I would not abandon the feature - I still think it is worthwhile to have and a behavior I would expect "good software" to have. After all, we have a lot of code for configuring the style of the output, something that I would consider less of a priority than language specific downloads if I were to write a tldr client from scratch today. As far as vision goes, I agree that the project should stay maintainable and fast. Still, I see tealdeer as "the sane choice for tldr on the command line" and from that perspective I think that we should not refrain from implementing measures to reduce disk space and internet usage for minimalist users. I don't know, for some reason I draw the line somewhere between this PR and git downloads :^) |
On windows 11, installed tealdeer via scoop. I used the timeit command of nushell and ran tldr --update. It took 6min 2sec 768ms 912µs 800ns to update the cache. Other details, this was done over WiFi and the laptop in question does have a slower HDD. |
To be honest, I doubt that. But if you could pull that off, I wouldn't be opposed to merging it. My main worry is really the added complexity and the increased maintenance burden. How should we proceed? |
Okay, so I will sit down and do a big refactor when I find the time for it. Let's keep this draft open for visibility |
Okay, what do you think about the following: // new_cache.rs
use std::{path::Path, time::Duration};
use crate::{cache::PageLookupResult, types::PlatformType};
use anyhow::Result;
pub struct Language<'a>(&'a str);
pub struct CacheConfig<'a> {
pub pages_directory: &'a Path,
pub custom_pages_directory: Option<&'a Path>,
pub platforms: &'a [PlatformType],
pub languages: &'a [Language<'a>],
}
/// The directory backing this cache is checked to be populated at construction.
pub struct Cache<'a> {
config: CacheConfig<'a>,
}
impl<'a> Cache<'a> {
pub fn open(config: CacheConfig<'a>) -> Result<Option<Self>> {
todo!()
}
pub fn open_or_create(config: CacheConfig<'a>) -> Result<Self> {
todo!()
}
pub fn age(&self) -> Result<Duration> {
todo!()
}
pub fn list_pages(&self) -> impl IntoIterator<Item = String> {
[]
}
pub fn find_page(&self, command: &str) -> Option<PageLookupResult> {
todo!()
}
pub fn clear(self) -> Result<()> {
todo!()
}
pub fn update(&mut self, archive_url: &str) -> Result<()> {
todo!()
}
} In let cache_config = new_cache::CacheConfig {
pages_directory: &config.directories.cache_dir.path.join("tldr-main"),
custom_pages_directory: config
.directories
.custom_pages_dir
.as_ref()
.map(PathWithSource::path),
platforms: todo!(),
languages: todo!(),
};
let cache = if args.update {
new_cache::Cache::open_or_create(cache_config)?
} else {
new_cache::Cache::open(cache_config)?.context("Cache not found, run `tldr --update`.")?
};
if args.clear_cache {
cache.clear()?;
clear_deprecated_cache(&config)?;
return Ok(());
}
if should_update_cache(todo!(), &args, &config) {
cache.update(todo!())?;
clear_deprecated_cache(&config)?;
}
if args.list {
for page in cache.list_pages() {
println!("{}", page);
}
return Ok(());
}
ensure!(
!args.command.is_empty(),
"CLI parsing should have not allowed this!"
);
// Note: According to the TLDR client spec, page names must be transparently
// lowercased before lookup:
// https://github.com/tldr-pages/tldr/blob/main/CLIENT-SPECIFICATION.md#page-names
let command = args.command.join("-").to_lowercase();
let Some(lookup_result) = cache.find_page(&command) else {
if !args.quiet {
print_warning(
enable_styles,
&format!(
"Page `{}` not found in cache.\n\
Try updating with `tldr --update`, or submit a pull request to:\n\
https://github.com/tldr-pages/tldr",
&command
),
);
}
bail!("Page not found")
};
if let Err(ref e) = print_page(&lookup_result, args.raw, enable_styles, args.pager, &config) {
print_error(enable_styles, e);
bail!(e)
}
Ok(()) One thing that I paid attention to is that the cache should not care about outputting message to the user - no enum DomainError {
CacheNotFound,
PageNotFound(&str),
// ...
}
Cache::open(cache_config)?.or_domain_error(&output_manager, DomainError::CacheNotFound)? ... but that is a topic for another time. Another thing: I returned early from Note that I already put it fn clear_deprecated_cache(config: &Config) -> anyhow::Result<()> {
let deprecated_config = new_cache::CacheConfig {
pages_directory: &config.directories.cache_dir.path().join("tldr-master"),
custom_pages_directory: None,
platforms: &[],
languages: &[],
};
let Some(cache) = new_cache::Cache::open(deprecated_config)? else {
return Ok(());
};
cache.clear()?;
Ok(())
} So, what do you think? @dbrgn |
Any update on this? |
Closes #335, although there are some follow ups in the comments that we should probably open new issues for.
Also, I opened tldr-pages/tldr#11121 in the process.