Wikidata:Property proposal/Imagehash difference hash

From Wikidata
Jump to navigation Jump to search

Imagehash difference hash[edit]

Originally proposed at Wikidata:Property proposal/Sister projects

DescriptionImagehash difference hash is hash which tells whether two images look nearly identical.
RepresentsImagehash difference hash (Q124969714)
Data typeString
Domainmediainfo (Commons only)
Allowed values[a-z\d]{16}
Example 1M68454019 → e1f1c6c4c4c0e9ca
Example 2M68456558 → b0e47a8ac4c6c4c8
Example 3M68455617 → ecede1f06d23fc47
Example 4M68456184 → 92d8c1c491ccc0e8
Source
Planned useFirst I would populate hash values for photos uploaded by user:FinnaUploadBot, but generally hash could be added to all of the Commons files
Number of IDs in sourcecurrently there is 100M files in Commons and checksum can be calculated to all photos
Expected completenesseventually complete (Q21873974)
Robot and gadget jobschecksum should be generated by bot
See also
  • checksum (P4092): small-sized datum derived from a block of digital data for the purpose of detecting errors. Use qualifier "determination method" (P459) to indicate how it's calculated, e.g. MD5.
  • Wikidata:Property proposal/Imagehash perceptual hash
  • Motivation[edit]

    Same as with pHash proposal -- I am using the pHash and dHash checksums to detect duplicate photos in the Commons. I am also using pHashes and dHashes to confirm if the photos in the Commons and Finna repositories are the same. However, it would be useful if hashes could be shared so any user could query them. Pre-generated perceptual hashes of files could also be fetched from SDC as a list without downloading actual files. However, as there is slight wobbling in the hashes (because of scaling and compression), matching is much more robust when filtering out false negatives/false positives with a second hash, which is calculated using a different method so I would add the the dHash to the uploaded files too. --Zache (talk) 11:51, 18 March 2024 (UTC)[reply]

    Discussion[edit]

    @Abbe98, Multichill, Jura1, Tinker Bell: pinging for attention who was interested in pHash. Regards, ZI Jony (Talk) 17:02, 27 March 2024 (UTC)[reply]

     Support Complements pHash for additional reliability in matching images. Ipr1 (talk) 21:16, 27 March 2024 (UTC)[reply]

    You're welcome! Regards Kirilloparma (talk) 15:58, 30 March 2024 (UTC)[reply]