User:The Anomebot2

Note: Blocking will stop further edits: the bot will intermittently retry errors for several minutes, but should then automatically shut itself down until restarted manutally; please use a ten minute block or longer to be sure of stopping it.

This bot is designed to add standardized machine-readable geodata records to relevant articles in the English-language Wikipedia, using data from GNS, GNIS, OSGB coordinates in UK articles, plaintext geodata scraped from article text, and interwiki-linked geotag data from other-language Wikipedias. -- The Anome 12:13, 22 September 2007 (UTC)

Status
 71,415  geotags added to date. -- The Anome 12:06, 22 September 2007 (UTC)

To do
-- The Anome 12:12, 22 September 2007 (UTC)
 * Standardize existing geotags.
 * Scan for unusual/broken parameters in infoboxes.
 * Start work on standardizing infoboxes.

Forthcoming attractions
With ~70,000 data points, I now have enough data to do a spatial analysis of the category tree, and to generate lists of possibly misclassified or mislocated outliers. The cleaned up bounding data could then be used as a Bayesian classifier for future work. -- The Anome 10:14, 24 August 2007 (UTC)

Current problems
Because of severe name ambiguity problems, Japanese locations are now filtered out of most machine-matched geodata sets.

Recent Canadian data has had similar problems, and is now also filtered from the output of several matching algorithms. -- The Anome 12:17, 22 September 2007 (UTC)