Tool: Find File (and Photo) Duplicates - CZKAWKA


Project repo: qarmin/czkawka on GitHub

CZKAWKA is the duplicate finder I had been searching for far too long.

It doesn’t just detect exact, bit-for-bit duplicates — it also comes with several powerful modules for tasks like identifying:

  • Similar images
  • Similar videos
  • Empty directories
  • Empty files
  • Temporary files
  • Music duplicates
    …and more.

One of my favorite features is the ability to define a reference folder. You can mark one location as the “ground truth”, and CZKAWKA will compare everything else against it, flagging only the other copies as duplicates. That fits perfectly with how I’m trying to clean up the mountain of old files I accumulated over the years — backups in random locations, partial copies, disorganized folders… you name it.

(I merged all the images from similar sources (e.g. camera folders) into one location. Now, I need to check which of them are already in the ‘good and sorted’ location.)

No matter the reason your data got messy — CZKAWKA is the tool that will finally let you get it under control!

The Compare view

Smart Similarity Matching for Images

CZKAWKA can detect similar images — not just exact copies. One can select between different hashing algorithms and chunk sizes to find images that are alike even if their resolution and size differs.

There’s a similarity slider too, so you can choose how close two images need to be before they’re flagged as duplicates (if that is not enough increase the chunk size, that raises the bar in how similar images need to be).

Using the UI
(As you can see, it found duplicates even though the resolution and size varied widely.)

Running It Remotely? There’s a Docker for That

I really like CZKAWKA - but with large data sets, I don’t want to stream everything over the network to my local machine for comparisons (that would take forever).

At first, I thought about running it remotely via X-forwarding. But I didn’t want to install the whole tool directly on my server. So I looked into containerizing it — and luckily, someone already beat me to it!

You can run CZKAWKA in a Docker container and stream the GUI straight to your browser via VNC:
👉 jlesage/docker-czkawka

This is ideal if you want a clean separation between your data tools and your system — or if you’re just a fan of browser-based remote control.


If you’re stuck with a messy mountain of files and want to start fresh without losing anything important, give CZKAWKA a try.

Cheers
~Horo

Headshot of HoroTW

Hi, I'm HoroTW. I'm a software engineer and data scientist student based in Germany. You can see some of my work on GitHub or Codeberg, watch my videos on YouTube, read some more about me on my website.