Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some improvements. #951

Merged
merged 21 commits into from
Jan 20, 2024
Merged

Some improvements. #951

merged 21 commits into from
Jan 20, 2024

Conversation

Shuraken007
Copy link
Contributor

Hello. I added some extra staff from exa (https://github.com/ogham/exa) .
Specific icons + colors.

  • Added directory icon. Detected if path ends with os specific delimiter. .../dir/
  • Replaced with exa icons (much more icons)
  • Colorized icons: grouped files by categories, configs, helps, e.t.c.
  • Short output

Alas I'm new at rust. Probably do some things not optimal, espetially at creating struct File (had to copy data, no references).
I havn't touch tests.
Devicons works x2 times slower probably.
But still 3 seconds for 200k+ files vs 1 second without devicon. Suites for me.

New options:

   -i --iconcolor
   Colorize icons

   -n --nameshort
   Shorten path to filename only

   -d --dirshort
   Shorten path with compact dirs

Examples:

  • no additional options
    image

  • color icons
    image

  • nameshort
    image

  • dirshort1
    image

  • dirshort2
    image

@coreyja
Copy link
Owner

coreyja commented Nov 21, 2023

Hey @Shuraken007 I didn't see this over the weekend but this is great!

I'll take a look in the next couple days hopefully and either leave some notes.

I appreciate the PR! I don't use devicon-lookup as much as I used to so glad others are still getting value out of it!

improve shortening
made file-name white colour
@Shuraken007
Copy link
Contributor Author

Btw, as you see - next commit appeared.

  • support for fzf, added full path, separated with !
  • path shortening : normal and reversed
  • big file names also shorter

image

image

Comment on lines 28 to 42
f if self.is_temp(f) => Fixed(244).normal(),
f if self.is_immediate(f) => Fixed(1).bold().underline(),
f if self.is_image(f) => Fixed(37).normal(),
f if self.is_video(f) => Fixed(135).normal(),
f if self.is_music(f) => Fixed(92).normal(),
f if self.is_lossless(f) => Fixed(93).normal(),
f if self.is_crypto(f) => Fixed(109).normal(),
f if self.is_document(f) => Fixed(187).normal(),
f if self.is_compressed(f) => Red.normal(),
f if self.is_compiled(f) => Fixed(137).normal(),
f if self.is_pretty_data(f) => Fixed(178).normal(),
f if self.is_script(f) => Fixed(173).normal(),
f if self.is_config(f) => Fixed(65).normal(),
f if self.is_vim(f) => Fixed(71).normal(),
f if self.is_language(f) => Fixed(75).normal(),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, I'm not sure I've seen this style of match with if self inside in the wild.
I really like it though!

@coreyja
Copy link
Owner

coreyja commented Nov 30, 2023

Ok @Shuraken007 this is super awesome!! Sorry it took me awhile to take a look!

But I looked at the code and I don't have many comments to leave this is awesome! I'm happy to take a closer look at the data and see if we can weave references in more if you want.

Are the devicon_lookup1 and file1 the original versions of the code? I don't see those being loaded into the current code right?

For testing did you have any thoughts? It looks like one of the reasons the tests are failing is that the output includes a leading slash by default now.
Totally up to you if you like that format, but at least a couple of the tests might pass if you strip that out

But if you like the new format, I'd be happy just updating the tests with the new format and icons and calling it a day! And/or change any up if you think something else could be better.
Most (all?) of the current tests are written to just consume the binary, so shouldn't use any internals. My theory when I wrote them is it would be easier to adapt to the code changing, lets see how it goes!

And for the lint-rust job I think you should be able to run cargo fmt and get all those formatting issues handled.

And to wrap up, do you want commit access here? This PR is awesome! And like I said before I'm not using devicon-lookup anymore, so happy to give you commit access to keep improving things if you'd be interested! Just let me know, no pressure though of course!

Oh and one more thing! Just since you mentioned being new to Rust, I wanted to mention clippy which you can run with cargo clippy! Its a really cool linter that can point out things you may be able to improve, and show you ways to simplify code. Its been really helpful to me in learning Rust so wanted to share!

Thanks again for this PR, it really is great!

@Shuraken007
Copy link
Contributor Author

@coreyja

  • removed extra files devicon_lookup1, file1
  • fixed tests (but not added new for testing new flags)

The most change, I suppose - you should add '-s' / '--substitute' flag, when you want to save extra things
Also I need optimize expertize, cause my stuff slows output some times.
Is it possible to make structure File, that won't copy everything, but operated with &str, links to some parts of original file_path.
I tried, but had lot of problem with borrowin issues.

@Shuraken007
Copy link
Contributor Author

@coreyja , I make simple prototipe with struct FIle on &str based
Can you solve problem:
'filename' is borrowed here
returns a value referencing data owned by the current function

simple.zip

@coreyja
Copy link
Owner

coreyja commented Dec 6, 2023

Hey @Shuraken007 thanks for giving that prototype a try! Thanks for taking a stab at the reference optimizations!

I definitely can try and help out with the borrow errors! I'll try to take a look this week, but there is a chance I won't get to it before the weekend. Just wanted to give you an timeline so you weren't waiting for me lol

@Shuraken007
Copy link
Contributor Author

Shuraken007 commented Dec 6, 2023

@coreyja, got it.
By the way, I tried to run the simpliest logic, turned off all logic, icon calculation, just output original input on print_with_symbol.

It's still 3 times slower, than your main branch.
All is eated by File::new structure.

… plus we weren't adding the right / back to the end
…ctly

This refactor is mostly so that we don't need to create instances of `FileExtensions`

This struct is more just a collection of functions, so it didn't make sense
to create an instance of it.
I might even remove the struct entirely in the future, and just have these be
free functions that live in a module
@coreyja
Copy link
Owner

coreyja commented Dec 12, 2023

OK @Shuraken007!

I'm sitting down to take a look at this PR. I think I'm going to do this "Lab Notebook" style, where I'm just going to record basically everything I am doing here so that hopefully you can follow along with!

To start off I want to run the existing Benchmarks to see what the main branch vs this new branch are looking like.

Initial Benchmarks

main

Ran with cargo bench

    Finished bench [optimized] target(s) in 0.04s
     Running unittests src/main.rs (target/release/deps/devicon_lookup-f5f68abaef0a04b0)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/benchmark_cli.rs (target/release/deps/benchmark_cli-779b5f26c4988c0e)
Gnuplot not found, using plotters backend
benchmark_each_file_type_plain_cli
                        time:   [9.7544 µs 9.8062 µs 9.8669 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

benchmark_each_file_type_color_cli
                        time:   [9.9813 µs 10.056 µs 10.140 µs]
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

benchmark_each_file_type_regex_cli
                        time:   [15.335 µs 15.401 µs 15.474 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Benchmarking benchmark_each_file_type_prefix_cli: Collecting 100 samples in estimated 5.0487 s (328k iterations
benchmark_each_file_type_prefix_cli
                        time:   [15.192 µs 15.224 µs 15.256 µs]
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  6 (6.00%) high severe

forkBranch

Ran with cargo bench again

   Compiling devicon-lookup v0.9.1 (/Users/coreyja/Projects/devicon-lookup)
    Finished bench [optimized] target(s) in 0.85s
     Running unittests src/main.rs (target/release/deps/devicon_lookup-317f01cf14dbae14)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/benchmark_cli.rs (target/release/deps/benchmark_cli-f7552f4c68a5a1fa)
Gnuplot not found, using plotters backend
benchmark_each_file_type_plain_cli
                        time:   [9.6448 µs 9.6759 µs 9.7122 µs]
                        change: [-2.7011% -2.0111% -1.3539%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  8 (8.00%) high mild
  9 (9.00%) high severe

benchmark_each_file_type_color_cli
                        time:   [10.068 µs 10.198 µs 10.371 µs]
                        change: [-0.6818% +0.1962% +1.0603%] (p = 0.68 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

benchmark_each_file_type_regex_cli
                        time:   [15.271 µs 15.325 µs 15.385 µs]
                        change: [-1.0660% -0.6298% -0.1623%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  9 (9.00%) high severe

Benchmarking benchmark_each_file_type_prefix_cli: Collecting 100 samples in estimated 5.0332 s (328k iterations
benchmark_each_file_type_prefix_cli
                        time:   [15.274 µs 15.313 µs 15.353 µs]
                        change: [+0.0372% +0.5370% +0.9775%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe


Analysis

Oh interesting!! So in the benchmarks in the repo, running on my Apple M1 Pro, is actually faster in one benchmark and very much within the noise threshold for the others!
I re-ran these benchmarks a few times and getting pretty consistent results! So I think this new version is NOT meaningfully worse than the original!

@Shuraken007 were you testing in release mode when doing your comparisons? You can do that with cargo build --release to drop the binary in target/release/devicon_lookup or you can use carge run --release to have cargo just run the binary after compilation!
I'm wondering if you were testing with the default debug mode for your new binary? It wouldn't surprise me too much that the --release was ~3x faster than the debug builds!

What you get with cargo install and the pre-compiled binaries in Github Releases were made with --release so thats what we want to compare to!

So I definitely don't think this is as bad as you thought initially! But lets dive into the code anyways and see what we see!

Code Exploration

NOTE: While I started this by thinking I was looking for performance optimizations I really didn't end up making any. We are still holding onto Owned data in File, so there is a chance we could optimize more. But given I'm not seeing this look slower I didn't optimize this farther. Its much easier to hold onto Owned data, so I like to avoid holding references in structs when we can afford to!

The first thing I want to look at is the File struct and specifically the ::new method since you called it out.

My first thought after looking at this struct is that it seems like a lot of the logic might be able to be replaced with usages of std::fs::File and std::path::Path (and its sibling PathBuf)!

Code Level Suggestion: When you are writing a function and want it to take a reference to a String you almost always want to use &str instead of &String. &str can be used by more things like a static string in code, where an &String is just a reference to an owned String type. I'm not sure that was the best explanation of that so happy to try and explain more! There is also this link that Clippy gives you about this: https://rust-lang.github.io/rust-clippy/master/index.html#/ptr_arg

I think I want the File struct to just contain a Path, and I believe we should be able to implement all its methods on top of that! I'm going to give that a shot! Path is a borrowed version of PathBuf so on second thought I might make this a PathBuf for now, and then try to optimize it to a Path later! In general its easier to work with owned data so it might be simpler to use PathBuf initially

Since the fields of File are currently public I'm first going to make them private and make methods for all the accesses to them. That way I can remove the fields and work on re-implementing the methods to work off the PathBuf

Ok that is done in commit 7a115ab 7a115ab!

Now I am working through removing the fields and replacing them with just a PathBuf. I commented out the more complicated methods and starting with the ones I just added.

File::dir() is giving me an issue however. I think I need to store the original path string in our Struct as well... This is because PathBuf::is_dir() hits the actual filesystem which I think we want to avoid (and your original impl didn't use the file system to check for a directory existing) AND PathBug::ends_with doesn't work since it's already stripped out any path separators.
So for now we will store both the PathBuf and the original String

Ok got all the old fields into methods that operate over the PathBuf and String! That is done in commit b7b04bb b7b04bb.

Going to look at the other methods I todo!ed but want to come back and see if we can stop saving the full path as a String.

I re-wrote most of short_path() to try and lean into PathBuf::components and using iterator map instead of building up the array manually.
I did NOT test this super well, so there is a good chance I didn't port over the exact logic that you had, so definitely take a look at this and see if it seems like its missing anything!
That is in commit 4ff62b5 4ff62b5

I made a few cleanup commits and refactored a few small things. I also found some bugs that I squashed. The commits have good messages, I'm up to commit bc7ce0b now

Next up is looking at FileExtension! The first thing I noticed is that we create an instance of this Struct, but it holds no data. I wonder if we need to create the instance at all. We maybe be able to make all the functions NOT take a &self and have them callable like FileExtension::
3d5521f

Ok I wrapped up for the evening in commit 6a62c54 6a62c54.
This is just some general cleanup! I refactored some of the if/else logic in a few methods so that its hopefully easier to follow!

And I think with that I'm going to wrap up for the evening!
What do you think? Were my changes helpful?
Was this write up helpful?

Any feedback is great to hear since it means I can work on improving my skills in reviewing code and mentoring others!

Thanks again for the awesome PR! As I was playing around with it I couldn't get over how cool it is!

@coreyja coreyja mentioned this pull request Dec 12, 2023
@coreyja coreyja linked an issue Dec 12, 2023 that may be closed by this pull request
@Shuraken007
Copy link
Contributor Author

@coreyja About benchmarks. I tested on my home directory.

cd ~
fd . --hidden | devicon-lookup | fzf

it's about 200k files.

and without any precise measures - I saw, that main branch takes smth like 1.5-2 sec, and forkBranch around 5-6.

Well, I made 3 folders with versions:

  • main branch - devicon_orig,
  • mine last commit - devicon_fork
  • your refactor - devicon_refactor
  • added simplest output for them
    pub fn print_with_symbol(&self, args: &Args) {
        let s = format!("{}\n", self.original);
        return write_to_stdout(s.as_bytes());

Created simple bash script for measure time.

#!/bin/bash

run_devicon_n_times() {
   name=$1
   times=$2
   for i in $(seq $times); do
      cat ~/tmp/devicon/file_list | eval "~/p/src/${name}/target/release/devicon-lookup" > /dev/null
   done
}

mkdir -p ~/tmp/devicon
fd . ~ --hidden > ~/tmp/devicon/file_list
echo 'total_files: ' $(cat ~/tmp/devicon/file_list | wc -l)
echo
echo running 'devicon_fork':
time run_devicon_n_times devicon-lookup 10
echo
echo running 'devicon_refactor':
time run_devicon_n_times devicon_refactor 10
echo
echo running 'devicon_orig':
time run_devicon_n_times devicon_orig 10

So, result is fun, before refactoring it takes x6 times.

❯ ./test
total_files:  219919

running devicon_fork:

real    0m12.232s
user    0m11.322s
sys     0m1.102s
running devicon_refactor:

real    0m2.543s
user    0m1.597s
sys     0m1.080s

running devicon_orig:

real    0m2.430s
user    0m1.668s
sys     0m0.896s

With calculating icon / extension, e.t.c.

❯ ./test
total_files:  219920

running devicon_fork:

real    0m13.753s
user    0m12.652s
sys     0m1.299s

running devicon_refactor:

real    0m5.074s
user    0m4.171s
sys     0m1.055s

running devicon_orig:

real    0m2.801s
user    0m1.859s
sys     0m1.087s

I tested, what gives this slowness, it's regex.
let re = RE_PATH.captures(full_path).unwrap();
I added this to your refactored code, and user_time rises to sky: 1.6 -> 12
near with time, before your refactoring

Also I played around PathBuf, added calculations for File::new.

        let path = PathBuf::from(full_path); 
        let parent = path.parent().unwrap();
        let component_count = parent.components().count();
        let ext = path.extension();

time 1.6 -> 2.5
So, I'm surprised, that PathBuf is extremly effective.
I thought it should use smth like regex inside too.

I tried PathBuf, when started forkBranch. Founded, that it goes to file system on checking is_dir, and decided to make smthing simple without such unexpected dependencies. Ha-ha.

@Shuraken007
Copy link
Contributor Author

Also, thanks a lot for reviewing.
I think, that it's better to use github review system.
Better add comments there (extra overexplained).
It's easier to use dialog there.

@coreyja
Copy link
Owner

coreyja commented Dec 12, 2023

I think, that it's better to use github review system.
Better add comments there (extra overexplained).
It's easier to use dialog there.

Ya great point I really should have used Github Reviews, and looking back not really sure why I decided not to. Thanks for the callout!

And thanks for doing some more benchmarking! That makes sense that its regex that is slow! regex'es are fast in general but doing them in a loop definitely can add up quick!

I tried PathBuf, when started forkBranch. Founded, that it goes to file system on checking is_dir, and decided to make smthing simple without such unexpected dependencies. Ha-ha.

Haha love that you ran into this problem too!

@coreyja
Copy link
Owner

coreyja commented Jan 20, 2024

Hey @Shuraken007 sorry for the long delay here!

But thanks again for this awesome PR! I'm going to merge it in now, and should have a new version made this afternoon that include these changes! Likely as a 0.10.0 release!

Thanks again!!

@coreyja coreyja merged commit 627f4d0 into coreyja:main Jan 20, 2024
2 checks passed
@petobens
Copy link

petobens commented Jan 27, 2024

Hi @Shuraken007 and @coreyja .. I'm trying the new release and getting the following error: (also even though I'm listing directories the folder icon is not shown; is there an extra setting needed?)
image

Another example: if I run z or zoxide i would expect the items in the list also to be folder icons but this doesn't happen:
image

@coreyja
Copy link
Owner

coreyja commented Jan 27, 2024

Hey @petobens thanks for the bug report! Taking a look now

Looking into the unwrap error first, and then I'll look into the dir icons

@coreyja
Copy link
Owner

coreyja commented Jan 27, 2024

@petobens this should be fixed in 0.10.1! Give it a try and let me know

If it doesn't fix both those things definitely let me know!
I removed that whole unwrap case entirely so that shouldn't error anymore

And we are finally leaning on PathBuf to determine if something is directory (as opposed to just looking for a trailing slash). I thought this was in the last release, but I must not have snuck that in. Sorry about that and thanks again for the report!

@petobens
Copy link

Hi! Thanks for the quick turn around.

Running echo "/" | devicon-lookup (which is the valid root directory) still returns an error:
image

On the other hand. If I run zoxide query --list --score | devicon-lookup I don't get to see directory icons. Can that be fixed?
image

@coreyja
Copy link
Owner

coreyja commented Jan 31, 2024

Hey @petobens!

Thanks again for this report sorry the 0.10.x release has been a bit rocky lol
Just acking this report, its added to my list to look at! Hopefully will be able to look more this weekend

Thanks for all your help and reports! 🙏

@petobens
Copy link

Great!Thanks

@coreyja
Copy link
Owner

coreyja commented Feb 3, 2024

Ok @petobens! Give 0.10.2 a try! I think I removed the last of the unwraps so the panics should be gone 🤞
The one from a root dir should definitely be fixed!

I also got a version working with zoxide specifcally the stats command. I think you'll need to use --regex (-r) to pick out the filepath from the line
I was able to get this working!

zoxide query --list --score  | devicon-lookup -r '\d +(.*)$' -s

The -s option substitutes the icon into where the regex capture group was. This makes it so you can continue to see the scores from zoxide

image

@petobens
Copy link

petobens commented Feb 3, 2024

Works great! Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add directory icon
3 participants