|
|
Question : Help with Shell Script: find and move duplicate files in a directory tree
|
|
Heya, folks. I've got a directory tree that has 60+ GB of text documents (PDF, CHM, TXT, DOC) in a complex hierarchy. I need a script that will:
A) Run the entire directory tree looking for file names that are duplicates or near-duplicates. B) Place all the dupes in a separate directory, divided into files A and B.
Find + a nice regular expression would probably do the bulk of the work, but i'm new to scripting and so i don't know if that's the right place to start, nor (provided that it is) even where to go from there. Should i run the entire tree and create a database of every file in the tree, and parse that for duplicates? Or is there another way that i could work the script on the fly?
Any help or suggestions would be appreciated. There aren't any other folks i can turn to for help on this one.
Thanks much --
KiT.
|
Answer : Help with Shell Script: find and move duplicate files in a directory tree
|
|
you identified yourself that you better get a million monkeys to do it manually than a single script ;-)
Anyway, here are a few steps to get you closer:
find . -name \*.txt | awk -F/ '{print $NF" "$0};'|sort find . -name \*.txt | awk -F/ '{print $NF" "$0};'|tr '[A-Z]' '[a-z]' |sort find . -name \*.txt | awk -F/ '{print $NF"};'|tr '[A-Z]' '[a-z]' |sort|uniq -d find . -name \*.txt | awk -F/ '{f[$NF]=sprintf("%s %s",f[$NF],$0)}END{for (d in f){print d":"f[d]}}'|sort find . -name \*.txt -ls | awk '{f[$7]=sprintf("%s %s",f[$7],$NF)}END{for (d in f){print d":"f[d]}}' | sort find . -name \*.txt -exec file {} \; | sort +2
|
|
|
|
|