User:Multichill/Categorization bot
Jump to navigation
Jump to search
Idea for a new categorization bot.
The current bot (imagerecat.py) depends on Commonsense. This is often slow or even worse: Broken. The new bot should be stand alone.
At the moment about 40.000 files are uncategorized, but used at some project: query and result. Would be nice if a bot could categorize these files.
The should work on a list of images (for example the the list of used uncategorized files). For each image:
- See where the image is used.
- For each article, see if it contains a commonscat link
- If the article doesn't contain a Commonscat link, go work on the categories and interwiki's of the article
- The process should be recursive with finding a link to Commons as the stop condition.
- There should be some sort of scoring
- The earlier, the higher the score
- Multiple hits for the same category is a combined higher score
- There should be some kind of filtering
- Blacklist
- Skip hidden categories
- (etc etc)