sef.kloninger.com (Posts about movies)

Oppenheimer -- A Miss

Sef Kloninger — Mon, 19 Feb 2024 21:20:00 GMT

I was so disappointed in Oppenheimer. It wasn't terrible, but what +a missed opportunity.

Mystifying. Part of the job of a movie like this is to give you +insight into what the person is thinking and feeling. Especially +in this case -- what would it be like to be the person who made +these terrible weapons? That's really what I wanted to get from +this movie. I understand he's complicated, and it's complicated. +But instead of dialog, or even some outright explaining, this movie +instead has a lot of closeups of Cillian Murphy staring into the +distance, or adjusting his hat.

Women. There were plenty of interesting women around -- in +academia, in the Manhattan project, in relationships. There are +only two women in the movie: one is a sex object with mental health +issues (Pugh); the other is a mother/lush, with a surprisingly sharp +tongue and wit but only for a minute (Blunt). The sex scene, and +quoting the famous line, was gratuitous and super weird.

Strauss. I'm not invested at all in the conflict +with Admiral Strauss. Paraphrasing the commentary on this by the good +folks at The Incomparable, "I don't care if some guy +becomes Commerce Secretary in Eisenhower Cabinet."

Given how much I like the history and the science of all +of this, I really pulling for this movie. Dang.

Cleaning Up Photo Duplicates

Sef Kloninger — Mon, 19 Feb 2024 20:00:00 GMT

I did a data cleanup over the weekend. I doubt this is interesting +for anyone else, I just wanted to capture my own notes.

Our family photos are stored up in Google Photos. We have about +100k photos and short movies that take up 0.5TB. I wanted to back +up to local storage, but also I liked the idea of trying the +mdisc format for long-term storage.

I ran Takeout and unzipped everything. I was surprised to see +so many duplicates. Some are copies in the same folders with suffixes +like "(1)", most are multiple copies in different folders. +Takeout seems to just store a copy for each album a photo is in.

Some poking at the data shows 88% of 302,755 files / 72% of +the bytes were unique (histogram at the bottom of the post). Removing +dups can save 200+GB. Sure not really worth the trouble but why +not.

Steps

1. On the NAS, gather file checksums (md5sum) and sizes for all +the files under Takeout/Google Files. Mostly variations on +of find -print0 | xargs -0.

2. Data cleanup to get into nice file paths and clean delimiters, +mostly interactively with vi.

3. Insert into sqlite using .separator and .import.

Here's what the database ended up looking like: Photos and Sizes +are inputs, HashCounts, Duplicates, and Candidates are +outputs.

CREATE TABLE Photos(hash TEXT,path TEXT);
+CREATE TABLE Sizes(size INT,path TEXT);
+CREATE TABLE HashCounts(hash TEXT,found INT);
+CREATE TABLE Duplicates(path STRING,hash STRING,found INT,size INT);
+CREATE TABLE Candidates(path STRING,hash STRING,size INT,pos INT);
+

+ +

4. Find the dups. I'm surprised to find some with 50 or more copies, +but it out some family favorites end up in lots of albums.

insert into HashCounts
+select hash, sum(1) as found from Photos group by hash;
+

+ +

5. Figure out candidates for deletion

insert into Duplicates
+select Photos.path, Photos.hash, HashCounts.found, Sizes.size
+from Photos, HashCounts, Sizes
+where HashCounts.hash=Photos.hash and Photos.path=Sizes.path
+and HashCounts.found > 1;
+

+ +

The first of each set is the one we'll keep.

insert into Candidates
+select *
+from (select path, hash, size, row_number() over (partition by hash) as row_number
+    from (select * from Duplicates order by hash, path desc)
+) where row_number > 1;
+

+ +

I was surprised that sqlplus supports window functions, nice.

The inner reverse-alpha sort on "path" takes care of two cases. +I tend to prefer keeping photos with names that start with +years, and Those come first alphabetically (nice). Also within +folders often there are many copies with "(1)" and "(2)" suffixes +that are generally cruft and most worthy of removing, and those +also sort last alphabetically (nice).

5. Dump out the "Candidates" using .output. Copy back to the NAS. +Do lots of spot checks. Convert to a bash script of rm commands, +run very carefully.

Tools

Sqlite is my go-to tool for ad-hoc work like this. It's fast +and simple, but only for small jobs -- this one is MB-scale.

select name, sum(pgsize) from dbstat group by name;
+name           sum(pgsize)
+-------------  -----------
+Candidates     3006464
+Duplicates     5541888
+HashCounts     11051008
+Photos         23240704
+Sizes          13807616
+sqlite_schema  4096
+

+ +

My Synology is a pretty good place for storage with ability to +ssh in and run local commands. But if I had to do this again, +though, I should have just bought a large locally-connected SSD. +All the transfers to-from the NAS were a hassle. Looking now I'm +stunned that you can get a 4 TB external SSD for under $300.

Some private notes here.

+(source)

My Love Letter to ATP

Sef Kloninger — Mon, 19 Feb 2024 21:20:00 GMT

I was so disappointed in Oppenheimer. It wasn't terrible, but what +a missed opportunity.

Given how much I like the history and the science of all +of this, I really pulling for this movie. Dang.

Cleaning Up Photo Duplicates

Sef Kloninger — Mon, 19 Feb 2024 20:00:00 GMT

I did a data cleanup over the weekend. I doubt this is interesting +for anyone else, I just wanted to capture my own notes.

Steps

1. On the NAS, gather file checksums (md5sum) and sizes for all +the files under Takeout/Google Files. Mostly variations on +of find -print0 | xargs -0.

2. Data cleanup to get into nice file paths and clean delimiters, +mostly interactively with vi.

3. Insert into sqlite using .separator and .import.

Here's what the database ended up looking like: Photos and Sizes +are inputs, HashCounts, Duplicates, and Candidates are +outputs.

CREATE TABLE Photos(hash TEXT,path TEXT);
+CREATE TABLE Sizes(size INT,path TEXT);
+CREATE TABLE HashCounts(hash TEXT,found INT);
+CREATE TABLE Duplicates(path STRING,hash STRING,found INT,size INT);
+CREATE TABLE Candidates(path STRING,hash STRING,size INT,pos INT);
+

+ +

4. Find the dups. I'm surprised to find some with 50 or more copies, +but it out some family favorites end up in lots of albums.

insert into HashCounts
+select hash, sum(1) as found from Photos group by hash;
+

+ +

5. Figure out candidates for deletion

insert into Duplicates
+select Photos.path, Photos.hash, HashCounts.found, Sizes.size
+from Photos, HashCounts, Sizes
+where HashCounts.hash=Photos.hash and Photos.path=Sizes.path
+and HashCounts.found > 1;
+

+ +

The first of each set is the one we'll keep.

insert into Candidates
+select *
+from (select path, hash, size, row_number() over (partition by hash) as row_number
+    from (select * from Duplicates order by hash, path desc)
+) where row_number > 1;
+

+ +

I was surprised that sqlplus supports window functions, nice.

5. Dump out the "Candidates" using .output. Copy back to the NAS. +Do lots of spot checks. Convert to a bash script of rm commands, +run very carefully.

Tools

Sqlite is my go-to tool for ad-hoc work like this. It's fast +and simple, but only for small jobs -- this one is MB-scale.

select name, sum(pgsize) from dbstat group by name;
+name           sum(pgsize)
+-------------  -----------
+Candidates     3006464
+Duplicates     5541888
+HashCounts     11051008
+Photos         23240704
+Sizes          13807616
+sqlite_schema  4096
+

+ +

Some private notes here.

+(source)

My Love Letter to ATP

Sef Kloninger — Wed, 26 Aug 2015 06:25:00 GMT

I listen to a fair number of podcasts. The Accidental Tech Podcast is my favorite by a mile.

It's the one that I look forward to every week. It comes out on @@ -221,81 +328,4 @@ for Google, and YouTube specifically. They tell me that it's a great place to work.

YouTube is one of the worlds foremost platforms for social commentary, education, and free speech. And it's plenty of entertainment too. Sounds -like fun.

Thick Apps Still Lose

Sef Kloninger — Wed, 19 Aug 2015 20:20:00 GMT

Thick apps won mobile. Fine.

On laptop (and desktop) it's not so clear. What is better, thick or thin? -I tend to live mostly in thin land, although I use some thick apps -regularly, like Twitter's Mac client and Apple Photos.

Every so often I give a big native app a try: Excel instead of -Google Sheets, Mail.app instead of Gmail, Reminders -instead of the barebones Tasks built into Gmail. (I can't bring myself -to try Word). But it's disappointing to see how those fancy apps keep -shooting themselves in the foot!

Take for example this Excel error message. Excel is whining that it -can't verify my subscription the first time I ran Excel untethered -(version 15.11.2, for what its worth). Sure you can click through the -warning, but would a newbie know to do that? At best off-putting, at worst -downright disorienting. Why warn me of this at all? And why in a modal -that stops me dead in my tracks?

It seems thick apps should win. They rock the unplugged use case. An even -better situation is flaky networks -- tethered, conference WiFi, -travelling. UI's deal notoriously poorly with intermittent or partial -outages. A thick client, relying on that connection only for hitting -API's, can hide the network.

Another place they should shine is the UI itself. They should be fast, -beautiful, and featureful. Too often they're not. For example I find -Mail.app to be clunky, difficult to customize, and its keyboard shortcuts -few and poorly done. Gmail is pretty good!

Finally there's the upgrade problem. Thick apps need conscious effort from -their users before their work sees user time and they get feedback. And -that's what drives innovation. Long cycles means slower (less) invention. -One example I love is Gmail's "undo send" feature. Boy, you sure do miss -that when you need it and it's not there! That should be on every thick -client by now, but I don't think it is. I do know that Gmail has it and -Mail.app still doesn't.

Maybe the Internet can help. Look at Chrome with its awesome auto -updates. What makes this work is solid engineering and exceptional quality -control. I've never seen behind the Google curtain, but I bet there's no -magic, just a lot of good engineering that leads to good software. Like: -good design and code reviews, tons of test coverage across many scenarios, -diverse and well-instrumented canaries, and thorough performance and -resource use testing. If Google didn't all of that so well, then we -wouldn't accept frequent pushes. Without the frequent upgrade cycle, -Chromes feature cycle would languish.

Electron is another bright spot. This is the framework that gives -Slack and GitHub's thick clients their fit and finish. It makes these feel -like true native apps, even though they are mostly web controls with -JavaScript the covers. Right-clicking still doesn't do what I want, and -text controls are finicky, but it's close. But what those rough edges buy -you, and the software producer, are frequent, reliable, and clean -upgrades.

My natural preference would be for thick apps. If they were done well, -I'd use them.

My Next Job

Sef Kloninger — Sun, 10 May 2015 16:30:00 GMT

I left my last job a few weeks back and it's high time to look for a new -one. If you're working on something interesting and think I could help, let -me know!

It's nice to not have a day job while looking for another. I was lucky enough -to do this once before in 2012 which turned out great. I learned then -that time and flexibility lets you talk to lots of friends and learn about a -breadth of projects. I found a fun project in a new domain (online -education), something I doubt I'd have found the normal way.

Maybe I'll get lucky again.

Enough small talk, what am I looking for?

I'm looking for some flavor of line manager. I'm a good senior manager and -code-every-day engineer; but I'm exceptional leading a team and running a -project. That's what line managers do: lead engineers, not other managers or -departments or matrix-anything. Also, if you're some kind of executive then -coding is an indulgence, and I'd rather it just be part of my job. Mostly I'm -talking to small companies, say 10-100 people (fun-size).

I want to build on my experience. I know infrastructure and cloud, SaaS -and enterprise, and online education. I'm probably not the best person for -your storage, security, gaming, e-commerce, or cryptocurrency company. I want -to stay working on Internet technology. I like the (micro)services model. -For my own projects I choose Python, JavaScript (frontend and backend), and -Java. I know web operations, especially the Amazon stack.

Location is important: I don't want to do a daily Menlo Park to San -Francisco round-trip. I'd like to work with friends if possible. And I -want to do something worthwhile.

You can always get to my resume from the header here, or via this short -link. I'm open to a bunch of things, just no kick boxing. Let's -have coffee/drink or take a walk.