Knowledge:Knowledge Signpost/2018-10-28/Special report 2

187: 605: 412: 532: 489: 573: 309: 110: 130: 654: 90: 120: 36: 140: 100: 298:(WDQ). Not only that, but matching implies you can solve the disambiguation question for people, and in the worst cases that can be really hard. When a common name, say "William Smith", came up, you were faced with a list that could run to several pages of hits on the exact name. Typically most of those items were undescribed, and very sparse when it came to biographical facts. 150: 279: 455:" statement highlighting the topic. Run queries on such statements and you have a language-independent mechanism for finding material you want on Wikisource. Bots started posting Wikidata items for all the articles in big works such as the DNB (around 30,000 of them), and in doing so created outline entries for them in Wikidata. 363:. This liberal interpretation of notability that came on stream in 2015 had some odd effects, when minor figures from other catalogs got items, but now that Wikidata is at the scale of 50 million items one hardly sees it as causing genuine problems. Typical databases in the cultural sector, for example the 594:, the major reliable sources guideline that applies to Knowledge's health information, again by use of SPARQL applied to metadata. Details aside, it is a project that could hardly have been conceived without the development of Wikidata and its supporting tools. Why not on Wikisource, you cry? Here's why: 282:

and bring up an answer: as of this writing, Wikidata knows about 2088 of these matches. At the time, the two British cultural institutions were interested in this question, and were going at it by traditional methods. I also had at the back of my mind another problem: how many ODNB women were missing

381:

pokes around in gallery storage looking for "old masters". Wikidata allows easier access to minor artists, with less dust and cobwebs, since over 22,000 Art UK artists, some 60% of the total, are represented. These can often be identified and merged with other items, though the scholarly challenges

462:

In 2017, I was rewarded for investing so much of my time: once I had made a key advance in my SPARQL understanding, I was able to write queries to remove the need for patrolling I did on Wikisource to see which Knowledge articles covered the DNB topics. In a neat plot twist, it was a tool for this

643:

From my current interests, I would single out the "SPARQL aggregate" as potentially having the same range of benefits for Wikimedia. SPARQL itself may become the first thought for issues of discoverability, because it can cope with disparate inputs as long as their relational structure is clear

635:

about Wikidata, my major theme was the "integration" of Wikimedia sites, facilitated by Wikidata. The infobox mechanism should become infrastructure for that integration and, in the way of such things, ultimately be taken for granted. "Citation reform" here can in principle be carried out as an

458:

I took up this direction in 2016, completing the identification of main subjects for the 30,000 metadata items of the DNB. I remember it being incredibly hard graft, even though the previous round of ODNB work meant that all the subjects of the biographies were already there. A proper matching

337:

Much heavy lifting was going on in early 2015: at that time the mix'n'match tool in gamified mode was my common approach: improved automatching has taken some of the fun, and most of the low-hanging fruit, from that mode of using it. Importantly, as the Wikidata community relaxed its view on

474:

SPARQL users, I should say parenthetically, form a good and collaborative community in my experience. I use the amazing full text search in mix'n'match most days for biographical research—and it too has functioning communities, the matchers and the uploaders of datasets. The more recent

790: 759: 471:, I can do my patrolling without much effort, and so thank those here who create articles on DNB topics. The articles show up once they have a Wikidata item (caveat here about some needed merging), which frees my time to be better spent working on the backlog of articles that don't. 644:("find me authors born in India, writing in English but with a Spanish mother"). What SPARQL aggregates do is to tack onto purely list-making queries any columns that may be computed spreadsheet-style from associated data. It appears to me a rather powerful model. 277:

It was not, however, until after Wikimania 2014 in London that I was really drawn into Wikidata editing. That was by a problem to solve, namely how many ODNB biographies were of BBC Your Paintings artists. These days I take it for granted that I can write an

447:

issue with Wikisource, where the French version alone has a million texts. How does one locate texts on a given topic? The category system is not really designed for that, and in any case is used inconsistently by the various languages.

483:" for pictures), and therefore has the potential to take some of the grind out of the discovery trail. Specialised software is a large factor in the development of Wikidata, not just bots, though they still play a massive role as well. 143: 559:. Wikibase is to Wikidata as MediaWiki is to Knowledge: it means essentially the same software as Wikidata, if without some features, but set up as an independent site and community. There is a Wikimedia UK blogpost title " 370:

In mid-2015, I pushed through the final stages of ODNB matching into Wikidata and, with Magnus, I helped select which further Wikidata items for Art UK artists should be created. The very interesting BBC television series

181:, in Cambridge, working on the ScienceSource project. He has created over 12,000 articles on the English Knowledge where he has made over 300,000 edits from a global count of almost one million edits to Wikimedia projects. 626:, neither here nor from the Wikidata end, yet I have ended up in a project that takes for granted their role: when ScienceSource adds statements to Wikidata, they can appear in infoboxes on Wikipedias in 300 languages. 113: 382:

in a Wikidata merge can be quite serious (and instructive) just because the notability standards are quite relaxed. Consequently, I have been in a number of meetings with Art UK, explaining Wikidata.

153: 133: 103: 210:

falls on 29 October, and will be celebrated by 34 Wikimedia events worldwide. Last year, indeed, there was WikiDataCon in Berlin. This time around, the cakes will be distributed far and wide.

390: 338:

notability, non-matched catalog entries were no longer just parked for later consideration: either they were created as items at once, or they were marked "N/A" and left on the back-burner.

459:

process got into side issues and required, for good conscience, large amounts of cleanup with plenty of merging of duplicates. Fortunately, merging is easier on Wikidata than on Knowledge.

341:

So arose the project of creating items for all ODNB entries. In other words, all ODNB topics, around 60,000 of them, which include academic arcana such as "Women in trade and industry in

586:

The underlying idea of ScienceSource is to be more systematic about searching the biomedical literature for medical facts that can be passed with good references to Wikidata. It applies

451:

It turned out in 2015 that Wikidata potentially could solve this problem. Any Wikisource text, however short, warrants its own Wikidata item about the text itself. That item can have a "

590:, as did WikiFactMine before it, but aims to bring it closer into the Wikimedia fold by posting the results to its Wikibase site, where SPARQL can be run over them. It will engage with 330:

did early spadework there, with Magnus providing the tech support, and we three began a long series of emails, wondering about getting other biographical datasets into mix'n'match.

76: 290:

So, as Wikidata turned two, I started to put time into the particular biographical area that was being opened up. By 2018 standards this was still pioneer stuff. There was no

548:

Wikidata's store of items about individual scientific papers shot up, from about half a million, and reached 5 million by August that year. It now tops 18 million, with

367:'s, contain many sparse entries. And (worse) often entries that one cannot match because there is no adequate identification provided. Wikidatans do delete such things. 748: 708: 698: 123: 678: 733: 70: 713: 703: 688: 560: 693: 671: 613: 555:

There was a further Wikimedia grant to ContentMine for the ScienceSource project that started in June this year, centred on the Wikibase site at the

723: 665: 549: 55: 44: 743: 728: 523: 738: 436: 385:

To answer the "missing women from ODNB" question, there was the matter of filling in the "sex or gender" field, and then writing some standard

683: 629: 186: 858: 564: 771: 778: 93: 463:

kind of patrolling that comes into my anecdote about the origin of mix'n'match above. I learned the facet of SPARQL I needed from

258: 21: 834: 829: 824: 803: 497: 248:

in 2013, though in nothing like today's form. After I made a feature request for a Wikisource tool at a meetup, he replied "

221:

My earliest Wikidata edits had slipped my mind. It turns out that what I added initially was the first actual statement to

623: 331: 819: 604: 427: 216: 515: 271: 814: 653: 439:. Back in 2010, I gave a talk based on about a year proof-reading the DNB at the Annual General Meeting of 373: 49: 35: 17: 597: 637: 763: 295: 207: 526:

tool was exploited to the full by bot operators on Wikidata. Tom now works as a Wikidata contractor.

326:

with the first edition and English Knowledge, matching the ODNB on Wikidata was a natural project.

507: 640:, though the social realities mean that the frictional forces may be a serious factor for delay. 301: 174: 522:, I saw the WikiCite initiative to get control of science bibliography take off as Tom Arrow's 360: 404: 163: 840: 531: 284: 229:. When that item had only sitelinks to Wikipedias, I linked it also to Churchill's father 8: 245: 230: 519: 241: 213:

Some people, I suppose, will still not buy into the acclaim. Here's a personal story.

552:. In October 2017, I went to WikiDataCon in Berlin, my first Wikimedia scholarship. 488: 468: 419:

English Wikisource, growth of proof-reading directly against scans in the early 2010s

226: 561:

Science Source seeks to improve reliable referencing on Knowledge's medical articles

572: 411: 398: 378: 327: 480: 452: 591: 444: 308: 506:, an unconventional Cambridge tech startup, as Wikimedian in Residence for the 432: 364: 252:", which is now undeniable. In any case, the initial datasets on the tool were 389:. I could see that the answer was about 2,000. The advance from the number to 349: 852: 196: 580:

The Betty and Gordon Moore Library, on the West Cambridge mathematics campus

539: 511: 440: 237:

meetup, which was probably the reason why I thought I should take a look.

476: 394: 249: 173:

Charles Matthews began regular Knowledge editing in 2003. He is currently

587: 464: 323: 556: 263: 253: 234: 514:

as a colleague. Over five months, the first half of it based at the

636:

infrastructural project using the same family of techniques around

200: 222: 386: 267: 431:(DNB), specifically the Victorian version edited initially by 616:, 20 October 2018, in Makespace, 16 Mill Lane, Cambridge UK 342: 233:, in February 2013. It was a few days after I had set up a 503: 377:

makes Art UK's work vividly accessible, as the presenter

178: 30:

Now Wikidata is six: SPARQL adds sparkle to WMF projects.

244:

tool, one of Wikidata's huge successes, was written by

563:" which is about ScienceSource, along with a set of 776:If your comment has not appeared here, you can try 270:person ID, that started off under the older name " 850: 291: 332:Andrew's pair of Wikidata blogposts from 2014 161: 359:)" and its 14 examples, would be considered 322:As an extension of what I had been doing on 787:No comments yet. Yours could be the first! 294:yet, though there was Magnus's substitute 510:supported by the Foundation. There I had 622:I've never been seriously involved with 779: 479:supports main subject work (including " 259:Oxford Dictionary of National Biography 14: 851: 802:Explore Knowledge history by browsing 54: 29: 550:over 150 million citation statements 859:Knowledge Signpost archives 2018-10 27: 652: 603: 571: 530: 520:held training sessions and blogged 487: 410: 307: 185: 56: 34: 28: 870: 502:In April 2017, I started work at 428:Dictionary of National Biography 148: 138: 128: 118: 108: 98: 88: 425:I came to the ODNB through the 772:add the page to your watchlist 516:Betty and Gordon Moore Library 498:WikiFactMine and ScienceSource 346: 13: 1: 755: 283:from the English Knowledge? 18:Knowledge:Knowledge Signpost 7: 614:Cambridge Wikidata Workshop 443:. There was and still is a 374:Britain's Lost Masterpieces 199:(centre) with an award for 10: 875: 565:basic introductory videos 217:I get started on Wikidata 208:Wikidata's sixth birthday 195:When Wikidata was three: 175:Wikimedian in Residence 657: 630:Last time I wrote for 608: 576: 535: 492: 435:, and I found the DNB 415: 391:an actual redlink list 312: 296:Wikidata Query Service 190: 39: 656: 607: 575: 534: 491: 467:. With the help of a 414: 311: 189: 38: 769:To follow comments, 598:The social realities 508:WikiFactMine project 250:I have a better idea 612:Group photo at the 401:, was still major. 316:Andrew Gray in 2014 287:had asked me that. 71:Now Wikidata is six 764:Discuss this story 658: 609: 577: 557:ScienceSource Wiki 536: 493: 416: 334:give the flavour. 313: 272:BBC Your Paintings 191: 45:← Back to Contents 40: 780:purging the cache 749:From the archives 709:Technology report 699:Discussion report 542:at Wikimania 2017 477:TopicMatcher tool 280:easy SPARQL query 227:Winston Churchill 50:View Latest Issue 866: 843: 783: 781: 775: 762: 719:Special report 2 679:From the editors 676: 668: 661: 379:Bendor Grosvenor 361:Wikidata-notable 358: 354: 348: 302:Becoming serious 166: 164:Charles Matthews 152: 151: 142: 141: 132: 131: 122: 121: 112: 111: 102: 101: 92: 91: 68:Special report 2 62: 60: 58: 874: 873: 869: 868: 867: 865: 864: 863: 849: 848: 847: 846: 845: 844: 839: 837: 832: 827: 822: 817: 810: 799: 798: 793: 791:+ Add a comment 788: 785: 777: 770: 767: 766: 760:+ Add a comment 758: 754: 753: 752: 734:Recent research 669: 666:28 October 2018 664: 662: 659: 647: 646: 619: 618: 617: 601: 600: 583: 582: 581: 569: 545: 544: 543: 528: 500: 494: 485: 445:discoverability 422: 421: 420: 408: 407: 356: 352: 319: 318: 317: 305: 304: 285:Carbon Caryatid 219: 204: 203: 192: 168: 167: 160: 159: 158: 149: 139: 129: 119: 109: 99: 89: 83: 80: 69: 65: 63: 57:28 October 2018 53: 52: 47: 41: 31: 26: 25: 24: 12: 11: 5: 872: 862: 861: 838: 833: 828: 823: 818: 813: 812: 811: 801: 800: 797: 796: 795: 794: 789: 786: 768: 765: 757: 756: 751: 746: 741: 736: 731: 726: 721: 716: 714:Special report 711: 706: 704:Traffic report 701: 696: 691: 689:News and notes 686: 681: 675: 663: 651: 650: 649: 648: 620: 611: 610: 602: 599: 596: 584: 579: 578: 570: 546: 538: 537: 529: 499: 496: 495: 486: 433:Leslie Stephen 423: 418: 417: 409: 406: 405:More discovery 403: 365:British Museum 320: 315: 314: 306: 303: 300: 218: 215: 206: 194: 193: 184: 170: 169: 157: 156: 146: 136: 126: 116: 106: 96: 85: 84: 81: 75: 74: 73: 72: 67: 66: 64: 61: 48: 43: 42: 33: 32: 15: 9: 6: 4: 3: 2: 871: 860: 857: 856: 854: 842: 836: 831: 826: 821: 816: 808: 806: 792: 782: 773: 761: 750: 747: 745: 742: 740: 737: 735: 732: 730: 727: 725: 722: 720: 717: 715: 712: 710: 707: 705: 702: 700: 697: 695: 692: 690: 687: 685: 682: 680: 677: 673: 667: 660:In this issue 655: 645: 641: 639: 634: 633: 627: 625: 615: 606: 595: 593: 589: 574: 568: 566: 562: 558: 553: 551: 541: 533: 527: 525: 521: 517: 513: 509: 505: 490: 484: 482: 478: 472: 470: 469:Petscan query 466: 460: 456: 454: 449: 446: 442: 438: 437:on Wikisource 434: 430: 429: 413: 402: 400: 396: 392: 388: 383: 380: 376: 375: 368: 366: 362: 351: 344: 339: 335: 333: 329: 325: 310: 299: 297: 293: 288: 286: 281: 275: 273: 269: 265: 261: 260: 255: 251: 247: 246:Magnus Manske 243: 238: 236: 232: 228: 224: 214: 211: 209: 202: 198: 197:Magnus Manske 188: 183: 182: 180: 176: 165: 155: 147: 145: 137: 135: 127: 125: 117: 115: 107: 105: 97: 95: 87: 86: 78: 59: 51: 46: 37: 23: 19: 805:The Signpost 804: 718: 694:In the media 672:all comments 642: 632:The Signpost 631: 628: 621: 585: 554: 547: 501: 473: 461: 457: 453:main subject 450: 441:Wikimedia UK 426: 424: 384: 372: 369: 340: 336: 321: 289: 276: 262:(ODNB); and 257: 239: 220: 212: 205: 172: 171: 94:PDF download 841:Suggestions 588:text mining 504:ContentMine 399:ListeriaBot 328:Andrew Gray 256:, from the 242:mix'n'match 179:ContentMine 144:X (Twitter) 395:created by 324:Wikisource 266:, now the 82:Share this 77:Contribute 22:2018-10-28 835:Subscribe 624:infoboxes 264:catalog 2 254:catalog 1 235:Cambridge 853:Category 830:Newsroom 825:Archives 807:archives 724:In focus 592:WP:MEDRS 518:where I 231:Randolph 223:the item 201:Wikidata 134:Facebook 124:LinkedIn 114:Mastodon 20:‎ | 744:Opinion 729:Gallery 540:T Arrow 524:fatameh 512:T Arrow 481:depicts 739:Humour 465:Jheald 387:SPARQL 292:SPARQL 268:Art UK 154:Reddit 104:E-mail 820:About 684:Op-ed 16:< 815:Home 397:the 357:1500 355:– c. 353:1300 347:act. 343:York 240:The 638:Lua 274:". 225:on 177:at 162:By 79:— 855:: 567:. 393:, 350:c. 809:. 784:. 774:. 674:) 670:( 345:(

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Knowledge

:Knowledge Signpost/2018-10-28/Special report 2 - Knowledge

Index