Duplicate finder software...

Started by spacedrone808, June 20, 2021, 13:18:46

Previous topic - Next topic

spacedrone808

...of course for tracker music. Any ideas?

Saga Musix

Duplicates on which level? Exact byte-for-byte copies? Modules that are "mostly the same" but maybe with a few bytes of difference? Modules sharing the same samples? etc...
» No support, bug reports, feature requests via private messages - they will not be answered. Use the forums and the issue tracker so that everyone can benefit from your post.

spacedrone808

Byte-for-byte comparision can be achived by generic dup finders.
I suppose that song name and authoras matching criteria will be very neat. Yeah, i know that each tracker has it's own "exif"signature, but what if i'm missing smth?

Saga Musix

Finding duplicates by song name is probably rather useless - song names are far from unique, and many modules don't set a song title at all. Author information also not present at all in most modules.
I have been working on a tool on and off called Mod Library. I never had the time to bring it much beyond alpha stage, but I occasionally add new features. As it can already find files that are 100% identical, and as it already keeps track of all pattern contents, I can probably update it to also find modules with identical patterns, which I have found to be a very effective method of finding duplicate songs. I'll try to do that one of these days.
» No support, bug reports, feature requests via private messages - they will not be answered. Use the forums and the issue tracker so that everyone can benefit from your post.

spacedrone808

Ohaaa, very cool project. But no binaries though?

Saga Musix

I'll upload some binaries later today. As said, it's in alpha stage right now, and at this point I have no intent to keep backwards compatibility with previous database versions (I just recreate my own library whenever there's a breaking change) so I normally don't upload binaries for it, unless someone asks for them.
» No support, bug reports, feature requests via private messages - they will not be answered. Use the forums and the issue tracker so that everyone can benefit from your post.

spacedrone808

I see.. if i'm not mistaken there was similar software at maz-sound site.
But i can't remember it's name...

Saga Musix

Here's a current test build of Mod Library. After adding some files or folders to the database, you can hit the button for finding duplicates. Note that at the moment it will only show one file per duplicate, but for finding the actual duplicates it's at least a start. Hopefully later it would show all versions of a duplicate file.

Note that adding files to the database is relatively slow at the moment - adding 10,000 files can take about an hour or so - because it's single-threaded and an audio fingerprint is generated of every file. This allows to enter an AcoustID in the search and find songs that sound simliar to that AcoustID. However, the search results for that feature aren't that good yet, they find a lot of stuff that really is a completely different song IIRC.
» No support, bug reports, feature requests via private messages - they will not be answered. Use the forums and the issue tracker so that everyone can benefit from your post.

spacedrone808

Thank you for posting, will check it out in action

Tigoro

I use mo3enc for xm\mod\it\s3m\mtm modules.
1) mo3enc -m4 -rmiso for files (no compression, remove any text in modules)
Wait time...
2) find duplicates in *.mo3
Wait time...
3) remove clone modules in original collection
I found a lot duplicates in modland in some formats (example, stm2mod or mod2stm or other 4-16k modules formats) and find many tracks in Keygen collection (who make music, authors).

Saga Musix

Note that you can report duplicates on ModLand to the contacts mentioned in readme_welcome.txt at the root of the server. MO3 can remove a lot of "unnecessary" metadata but it doesn't guarantee to find every duplicate that went through the format conversions you mentioned, because MO3 stores data differently depending on what the original source format was (so e.g. a MOD converted to MO3 and then the same song converted from S3M to MO3 will not result in the same MO3 file).
» No support, bug reports, feature requests via private messages - they will not be answered. Use the forums and the issue tracker so that everyone can benefit from your post.

Tigoro

Quote from: Saga Musix on June 25, 2021, 08:44:52
Note that you can report duplicates on ModLand to the contacts mentioned in readme_welcome.txt at the root of the server. MO3 can remove a lot of "unnecessary" metadata but it doesn't guarantee to find every duplicate that went through the format conversions you mentioned, because MO3 stores data differently depending on what the original source format was (so e.g. a MOD converted to MO3 and then the same song converted from S3M to MO3 will not result in the same MO3 file).
To be honest, I'm afraid to flood the project with information. Coma had been sorting through my software collection of trackers for many years :-) I have a fresh slice of ftp.modland.com, I'll try it. Yes, not every duplicate, but many - unknow -

Saga Musix

Coma handed over administration of the archive to Menace, and he's eager to clean things up. :) Don't hesitate to report duplicates, they are frequently removed.
» No support, bug reports, feature requests via private messages - they will not be answered. Use the forums and the issue tracker so that everyone can benefit from your post.