187:
605:
412:
532:
489:
573:
309:
110:
130:
654:
90:
120:
36:
140:
100:
298:(WDQ). Not only that, but matching implies you can solve the disambiguation question for people, and in the worst cases that can be really hard. When a common name, say "William Smith", came up, you were faced with a list that could run to several pages of hits on the exact name. Typically most of those items were undescribed, and very sparse when it came to biographical facts.
150:
279:
455:" statement highlighting the topic. Run queries on such statements and you have a language-independent mechanism for finding material you want on Wikisource. Bots started posting Wikidata items for all the articles in big works such as the DNB (around 30,000 of them), and in doing so created outline entries for them in Wikidata.
363:. This liberal interpretation of notability that came on stream in 2015 had some odd effects, when minor figures from other catalogs got items, but now that Wikidata is at the scale of 50 million items one hardly sees it as causing genuine problems. Typical databases in the cultural sector, for example the
594:, the major reliable sources guideline that applies to Knowledge's health information, again by use of SPARQL applied to metadata. Details aside, it is a project that could hardly have been conceived without the development of Wikidata and its supporting tools. Why not on Wikisource, you cry? Here's why:
282:
and bring up an answer: as of this writing, Wikidata knows about 2088 of these matches. At the time, the two
British cultural institutions were interested in this question, and were going at it by traditional methods. I also had at the back of my mind another problem: how many ODNB women were missing
381:
pokes around in gallery storage looking for "old masters". Wikidata allows easier access to minor artists, with less dust and cobwebs, since over 22,000 Art UK artists, some 60% of the total, are represented. These can often be identified and merged with other items, though the scholarly challenges
462:
In 2017, I was rewarded for investing so much of my time: once I had made a key advance in my SPARQL understanding, I was able to write queries to remove the need for patrolling I did on
Wikisource to see which Knowledge articles covered the DNB topics. In a neat plot twist, it was a tool for this
643:
From my current interests, I would single out the "SPARQL aggregate" as potentially having the same range of benefits for
Wikimedia. SPARQL itself may become the first thought for issues of discoverability, because it can cope with disparate inputs as long as their relational structure is clear
635:
about
Wikidata, my major theme was the "integration" of Wikimedia sites, facilitated by Wikidata. The infobox mechanism should become infrastructure for that integration and, in the way of such things, ultimately be taken for granted. "Citation reform" here can in principle be carried out as an
458:
I took up this direction in 2016, completing the identification of main subjects for the 30,000 metadata items of the DNB. I remember it being incredibly hard graft, even though the previous round of ODNB work meant that all the subjects of the biographies were already there. A proper matching
337:
Much heavy lifting was going on in early 2015: at that time the mix'n'match tool in gamified mode was my common approach: improved automatching has taken some of the fun, and most of the low-hanging fruit, from that mode of using it. Importantly, as the
Wikidata community relaxed its view on
474:
SPARQL users, I should say parenthetically, form a good and collaborative community in my experience. I use the amazing full text search in mix'n'match most days for biographical research—and it too has functioning communities, the matchers and the uploaders of datasets. The more recent
790:
759:
471:, I can do my patrolling without much effort, and so thank those here who create articles on DNB topics. The articles show up once they have a Wikidata item (caveat here about some needed merging), which frees my time to be better spent working on the backlog of articles that don't.
644:("find me authors born in India, writing in English but with a Spanish mother"). What SPARQL aggregates do is to tack onto purely list-making queries any columns that may be computed spreadsheet-style from associated data. It appears to me a rather powerful model.
277:
It was not, however, until after
Wikimania 2014 in London that I was really drawn into Wikidata editing. That was by a problem to solve, namely how many ODNB biographies were of BBC Your Paintings artists. These days I take it for granted that I can write an
447:
issue with
Wikisource, where the French version alone has a million texts. How does one locate texts on a given topic? The category system is not really designed for that, and in any case is used inconsistently by the various languages.
483:" for pictures), and therefore has the potential to take some of the grind out of the discovery trail. Specialised software is a large factor in the development of Wikidata, not just bots, though they still play a massive role as well.
143:
559:. Wikibase is to Wikidata as MediaWiki is to Knowledge: it means essentially the same software as Wikidata, if without some features, but set up as an independent site and community. There is a Wikimedia UK blogpost title "
370:
In mid-2015, I pushed through the final stages of ODNB matching into
Wikidata and, with Magnus, I helped select which further Wikidata items for Art UK artists should be created. The very interesting BBC television series
181:, in Cambridge, working on the ScienceSource project. He has created over 12,000 articles on the English Knowledge where he has made over 300,000 edits from a global count of almost one million edits to Wikimedia projects.
626:, neither here nor from the Wikidata end, yet I have ended up in a project that takes for granted their role: when ScienceSource adds statements to Wikidata, they can appear in infoboxes on Wikipedias in 300 languages.
113:
382:
in a
Wikidata merge can be quite serious (and instructive) just because the notability standards are quite relaxed. Consequently, I have been in a number of meetings with Art UK, explaining Wikidata.
153:
133:
103:
210:
falls on 29 October, and will be celebrated by 34 Wikimedia events worldwide. Last year, indeed, there was WikiDataCon in Berlin. This time around, the cakes will be distributed far and wide.
390:
338:
notability, non-matched catalog entries were no longer just parked for later consideration: either they were created as items at once, or they were marked "N/A" and left on the back-burner.
459:
process got into side issues and required, for good conscience, large amounts of cleanup with plenty of merging of duplicates. Fortunately, merging is easier on
Wikidata than on Knowledge.
341:
So arose the project of creating items for all ODNB entries. In other words, all ODNB topics, around 60,000 of them, which include academic arcana such as "Women in trade and industry in
586:
The underlying idea of ScienceSource is to be more systematic about searching the biomedical literature for medical facts that can be passed with good references to Wikidata. It applies
451:
It turned out in 2015 that Wikidata potentially could solve this problem. Any Wikisource text, however short, warrants its own Wikidata item about the text itself. That item can have a "
590:, as did WikiFactMine before it, but aims to bring it closer into the Wikimedia fold by posting the results to its Wikibase site, where SPARQL can be run over them. It will engage with
330:
did early spadework there, with Magnus providing the tech support, and we three began a long series of emails, wondering about getting other biographical datasets into mix'n'match.
76:
290:
So, as Wikidata turned two, I started to put time into the particular biographical area that was being opened up. By 2018 standards this was still pioneer stuff. There was no
548:
Wikidata's store of items about individual scientific papers shot up, from about half a million, and reached 5 million by August that year. It now tops 18 million, with
367:'s, contain many sparse entries. And (worse) often entries that one cannot match because there is no adequate identification provided. Wikidatans do delete such things.
748:
708:
698:
123:
678:
733:
70:
713:
703:
688:
560:
693:
671:
613:
555:
There was a further Wikimedia grant to ContentMine for the ScienceSource project that started in June this year, centred on the Wikibase site at the
723:
665:
549:
55:
44:
743:
728:
523:
738:
436:
385:
To answer the "missing women from ODNB" question, there was the matter of filling in the "sex or gender" field, and then writing some standard
683:
629:
186:
858:
564:
771:
778:
93:
463:
kind of patrolling that comes into my anecdote about the origin of mix'n'match above. I learned the facet of SPARQL I needed from
258:
21:
834:
829:
824:
803:
497:
248:
in 2013, though in nothing like today's form. After I made a feature request for a Wikisource tool at a meetup, he replied "
221:
My earliest Wikidata edits had slipped my mind. It turns out that what I added initially was the first actual statement to
623:
331:
819:
604:
427:
216:
515:
271:
814:
653:
439:. Back in 2010, I gave a talk based on about a year proof-reading the DNB at the Annual General Meeting of
373:
49:
35:
17:
597:
637:
763:
295:
207:
526:
tool was exploited to the full by bot operators on Wikidata. Tom now works as a Wikidata contractor.
326:
with the first edition and English Knowledge, matching the ODNB on Wikidata was a natural project.
507:
640:, though the social realities mean that the frictional forces may be a serious factor for delay.
301:
174:
522:, I saw the WikiCite initiative to get control of science bibliography take off as Tom Arrow's
360:
404:
163:
840:
531:
284:
229:. When that item had only sitelinks to Wikipedias, I linked it also to Churchill's father
8:
245:
230:
519:
241:
213:
Some people, I suppose, will still not buy into the acclaim. Here's a personal story.
552:. In October 2017, I went to WikiDataCon in Berlin, my first Wikimedia scholarship.
488:
468:
419:
English Wikisource, growth of proof-reading directly against scans in the early 2010s
226:
561:
Science Source seeks to improve reliable referencing on Knowledge's medical articles
572:
411:
398:
378:
327:
480:
452:
591:
444:
308:
506:, an unconventional Cambridge tech startup, as Wikimedian in Residence for the
432:
364:
252:", which is now undeniable. In any case, the initial datasets on the tool were
389:. I could see that the answer was about 2,000. The advance from the number to
349:
852:
196:
580:
The Betty and Gordon Moore Library, on the West Cambridge mathematics campus
539:
511:
440:
237:
meetup, which was probably the reason why I thought I should take a look.
476:
394:
249:
173:
Charles Matthews began regular Knowledge editing in 2003. He is currently
587:
464:
323:
556:
263:
253:
234:
514:
as a colleague. Over five months, the first half of it based at the
636:
infrastructural project using the same family of techniques around
200:
222:
386:
267:
431:(DNB), specifically the Victorian version edited initially by
616:, 20 October 2018, in Makespace, 16 Mill Lane, Cambridge UK
342:
233:, in February 2013. It was a few days after I had set up a
503:
377:
makes Art UK's work vividly accessible, as the presenter
178:
30:
Now Wikidata is six: SPARQL adds sparkle to WMF projects.
244:
tool, one of Wikidata's huge successes, was written by
563:" which is about ScienceSource, along with a set of
776:If your comment has not appeared here, you can try
270:person ID, that started off under the older name "
850:
291:
332:Andrew's pair of Wikidata blogposts from 2014
161:
359:)" and its 14 examples, would be considered
322:As an extension of what I had been doing on
787:No comments yet. Yours could be the first!
294:yet, though there was Magnus's substitute
510:supported by the Foundation. There I had
622:I've never been seriously involved with
779:
479:supports main subject work (including "
259:Oxford Dictionary of National Biography
14:
851:
802:Explore Knowledge history by browsing
54:
29:
550:over 150 million citation statements
859:Knowledge Signpost archives 2018-10
27:
652:
603:
571:
530:
520:held training sessions and blogged
487:
410:
307:
185:
56:
34:
28:
870:
502:In April 2017, I started work at
428:Dictionary of National Biography
148:
138:
128:
118:
108:
98:
88:
425:I came to the ODNB through the
772:add the page to your watchlist
516:Betty and Gordon Moore Library
498:WikiFactMine and ScienceSource
346:
13:
1:
755:
283:from the English Knowledge?
18:Knowledge:Knowledge Signpost
7:
614:Cambridge Wikidata Workshop
443:. There was and still is a
374:Britain's Lost Masterpieces
199:(centre) with an award for
10:
875:
565:basic introductory videos
217:I get started on Wikidata
208:Wikidata's sixth birthday
195:When Wikidata was three:
175:Wikimedian in Residence
657:
630:Last time I wrote for
608:
576:
535:
492:
435:, and I found the DNB
415:
391:an actual redlink list
312:
296:Wikidata Query Service
190:
39:
656:
607:
575:
534:
491:
467:. With the help of a
414:
311:
189:
38:
769:To follow comments,
598:The social realities
508:WikiFactMine project
250:I have a better idea
612:Group photo at the
401:, was still major.
316:Andrew Gray in 2014
287:had asked me that.
71:Now Wikidata is six
764:Discuss this story
658:
609:
577:
557:ScienceSource Wiki
536:
493:
416:
334:give the flavour.
313:
272:BBC Your Paintings
191:
45:← Back to Contents
40:
780:purging the cache
749:From the archives
709:Technology report
699:Discussion report
542:at Wikimania 2017
477:TopicMatcher tool
280:easy SPARQL query
227:Winston Churchill
50:View Latest Issue
866:
843:
783:
781:
775:
762:
719:Special report 2
679:From the editors
676:
668:
661:
379:Bendor Grosvenor
361:Wikidata-notable
358:
354:
348:
302:Becoming serious
166:
164:Charles Matthews
152:
151:
142:
141:
132:
131:
122:
121:
112:
111:
102:
101:
92:
91:
68:Special report 2
62:
60:
58:
874:
873:
869:
868:
867:
865:
864:
863:
849:
848:
847:
846:
845:
844:
839:
837:
832:
827:
822:
817:
810:
799:
798:
793:
791:+ Add a comment
788:
785:
777:
770:
767:
766:
760:+ Add a comment
758:
754:
753:
752:
734:Recent research
669:
666:28 October 2018
664:
662:
659:
647:
646:
619:
618:
617:
601:
600:
583:
582:
581:
569:
545:
544:
543:
528:
500:
494:
485:
445:discoverability
422:
421:
420:
408:
407:
356:
352:
319:
318:
317:
305:
304:
285:Carbon Caryatid
219:
204:
203:
192:
168:
167:
160:
159:
158:
149:
139:
129:
119:
109:
99:
89:
83:
80:
69:
65:
63:
57:28 October 2018
53:
52:
47:
41:
31:
26:
25:
24:
12:
11:
5:
872:
862:
861:
838:
833:
828:
823:
818:
813:
812:
811:
801:
800:
797:
796:
795:
794:
789:
786:
768:
765:
757:
756:
751:
746:
741:
736:
731:
726:
721:
716:
714:Special report
711:
706:
704:Traffic report
701:
696:
691:
689:News and notes
686:
681:
675:
663:
651:
650:
649:
648:
620:
611:
610:
602:
599:
596:
584:
579:
578:
570:
546:
538:
537:
529:
499:
496:
495:
486:
433:Leslie Stephen
423:
418:
417:
409:
406:
405:More discovery
403:
365:British Museum
320:
315:
314:
306:
303:
300:
218:
215:
206:
194:
193:
184:
170:
169:
157:
156:
146:
136:
126:
116:
106:
96:
85:
84:
81:
75:
74:
73:
72:
67:
66:
64:
61:
48:
43:
42:
33:
32:
15:
9:
6:
4:
3:
2:
871:
860:
857:
856:
854:
842:
836:
831:
826:
821:
816:
808:
806:
792:
782:
773:
761:
750:
747:
745:
742:
740:
737:
735:
732:
730:
727:
725:
722:
720:
717:
715:
712:
710:
707:
705:
702:
700:
697:
695:
692:
690:
687:
685:
682:
680:
677:
673:
667:
660:In this issue
655:
645:
641:
639:
634:
633:
627:
625:
615:
606:
595:
593:
589:
574:
568:
566:
562:
558:
553:
551:
541:
533:
527:
525:
521:
517:
513:
509:
505:
490:
484:
482:
478:
472:
470:
469:Petscan query
466:
460:
456:
454:
449:
446:
442:
438:
437:on Wikisource
434:
430:
429:
413:
402:
400:
396:
392:
388:
383:
380:
376:
375:
368:
366:
362:
351:
344:
339:
335:
333:
329:
325:
310:
299:
297:
293:
288:
286:
281:
275:
273:
269:
265:
261:
260:
255:
251:
247:
246:Magnus Manske
243:
238:
236:
232:
228:
224:
214:
211:
209:
202:
198:
197:Magnus Manske
188:
183:
182:
180:
176:
165:
155:
147:
145:
137:
135:
127:
125:
117:
115:
107:
105:
97:
95:
87:
86:
78:
59:
51:
46:
37:
23:
19:
805:The Signpost
804:
718:
694:In the media
672:all comments
642:
632:The Signpost
631:
628:
621:
585:
554:
547:
501:
473:
461:
457:
453:main subject
450:
441:Wikimedia UK
426:
424:
384:
372:
369:
340:
336:
321:
289:
276:
262:(ODNB); and
257:
239:
220:
212:
205:
172:
171:
94:PDF download
841:Suggestions
588:text mining
504:ContentMine
399:ListeriaBot
328:Andrew Gray
256:, from the
242:mix'n'match
179:ContentMine
144:X (Twitter)
395:created by
324:Wikisource
266:, now the
82:Share this
77:Contribute
22:2018-10-28
835:Subscribe
624:infoboxes
264:catalog 2
254:catalog 1
235:Cambridge
853:Category
830:Newsroom
825:Archives
807:archives
724:In focus
592:WP:MEDRS
518:where I
231:Randolph
223:the item
201:Wikidata
134:Facebook
124:LinkedIn
114:Mastodon
20: |
744:Opinion
729:Gallery
540:T Arrow
524:fatameh
512:T Arrow
481:depicts
739:Humour
465:Jheald
387:SPARQL
292:SPARQL
268:Art UK
154:Reddit
104:E-mail
820:About
684:Op-ed
16:<
815:Home
397:the
357:1500
355:– c.
353:1300
347:act.
343:York
240:The
638:Lua
274:".
225:on
177:at
162:By
79:—
855::
567:.
393:,
350:c.
809:.
784:.
774:.
674:)
670:(
345:(
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.