Documenting Software Applications on Wikidata
Most of us have heard about Wikipedia and probably use it daily. However, many are still not aware of the much recent project of the Wikimedia Foundation called Wikidata. Wikidata, started in 2012 is a sister project of Wikipedia and is aimed to be a knowledge base of facts with references. Unlike Wikipedia, which is aimed primarily for human consumption, Wikidata can be used both by machines and humans, given its structured nature. This article documents how Wikidata can play a key role in the documentation of software.
Software Documentation on Wikipedia
Instead of focusing on individual software, this article focuses on two popular Linux distributions called KDE and GNOME, because they manage a very large number of applications. Both these distributions have an international open-source community with contributors from across the world. These contributions can be in many forms including development, testing, marketing, documentation, etc. Each distribution has several software applications related to games, education, development, graphics, multimedia, internet, etc. Many of the software applications sometimes have their own homepages or dedicated pages on the KDE or GNOME websites and sometimes even both. Some applications have also been documented on Wikipedia.
However, a careful look at the documentation and the availability of these software on Wikipedia can help us understand that not all the applications have been equally documented on all language editions. Given below are screenshots of Wikipedia pages related to GNOME and KDE applications in different languages.
KDE Applications is a dedicated page on English Wikipedia that lists applications in different categories. For example, the following screenshot shows the different applications in the category ‘Production’. It can be seen that not all the applications have an associated link or a dedicated Wikipedia page detailing the application.
But this problem is not limited to KDE applications. All the core applications of GNOME are listed on the English Wikipedia page: GNOME Core Applications. Here too, we see that not all the applications have a dedicated page as seen in the screenshot below.
We can also get a clearer view of the different applications from the template pages of KDE and GNOME applications on Wikipedia. Up until now, we just focused on English Wikipedia pages. We will also consider Wikipedia pages from languages like Italian, Portuguese, French, etc. in the upcoming sections.
Starting with English, we have a template page for KDE applications: Template:KDE which lists a large number of applications under different categories. There are very few redlinks. A redlink corresponds to an application with no dedicated article or article section for the given application.
Let us now consider the Italian template page for the KDE applications: Template:KDE. We see several redlinks corresponding to the missing pages for the applications.
Moving on to the French page: Modèle:Palette KDE, we also see many redlinks.
The situation is same on the Portuguese webpage: Predefinição:KDE.
The situation is not different for GNOME applications. Though the English Wikipedia template page currently shows no redlinks: Template:GNOME, this is not the case for other languages.
Let’s take an example. A template page for the GNOME applications from Portuguese Wikipedia Predefinição:GNOME shows a large number of redlinks.
Documentation is a difficult task. No software developer or company can ensure documentation in every language of their users, especially in their native languages. Adding to this problem is the constant evolution of software. Software applications evolve. New versions are released from time to time with the addition of new features or the removal of certain others. Documenting this constant evolution in all languages is indeed a difficult task.
These problems have been known to developers and researchers for quite a long time. One possible solution is to ensure a possible central store of information, which contains the key information of different software applications. This information can be translated in an automatic manner or with the help of contributors. Such a central store ensures that the information that needs to be repeated multiple times is translated only once. For example, if we know that every software requires the following information: homepage, logo, images, screenshots, software version, etc., we can ensure that the translations for these phrases in all possible languages. Such a central store came in the form of Wikidata.
Software Documentation on Wikidata
Wikidata is a free, open, linked, structured, collaborative, and multilingual knowledge base. Wikidata allows contributors to document different concepts across domains and is not just limited to software. Additionally, Wikidata contributors can import existing information from different language Wikipedia to Wikidata. Every Wikidata article (also called item) has several statements. Each statement can be seen as a combination of three values: subject-property-value, where the subject is the Wikidata article, property could be one of the following discussed above: homepage, logo, images, screenshots, etc. and value corresponds to the property value. Each statement must also be supported by one or more references. These references include books, journals, etc. If the information is imported from a Wikipedia page, it can also be specified.
But the information on Wikidata pages can also be used by existing Wikipedia pages. In fact, some Wikipedia pages have templates (also called infoboxes) that are linked to Wikidata and the data is fetched from Wikidata. For example, the KDE page in Basque.
Information related to Software
What type of information about Software can be documented on Wikidata. Every article on Wikidata is usually called an item-page. We will see the property pages later. Take, for example, the software called Akregator has a page on Wikidata: https://www.wikidata.org/wiki/Q1765672, where Q1765672 is called the identifier (or Q-number) of the software Akregator. In the screenshot below, we see this identifier along with the labels, descriptions, and aliases for the software. The contributors translate this information into different languages. Hence, Wikidata is also referred to as a multilingual knowledge base.
Now our next goal is to see statements about Akregator. We will see how the Q-number of Akregator (i.e., Q1765672) is used to specify the different information. For example, if we want to specify the fact that Akregator is an application and Akregator is a news aggregator, we may need to use the property instance of.
But as described above, repetitive information can be translated into multiple languages. So the same page with a parameter ?uselang=fr
will give the French version of Akregator: https://www.wikidata.org/wiki/Q1765672?uselang=fr. Q1765672 is the Q-number (or identifier) of Akregator in any language on Wikidata. Given below is the screenshot of information related to Akregator in French.
But what is instance of. This brings us to the second type of pages on Wikidata- the property pages. The instance of has a dedicated page: https://www.wikidata.org/wiki/Property:P31, where the different translations can be found. The screenshot below shows the labels, descriptions and aliases of P31
, also known as instance of in English. The properties are identified by their P-numbers. We use instance of to specify that Akregator is an application (or precisely, Akregator is an instance of an application).
This brings us to the next important question: what information is required for describing a software application? What are the different properties (or the P-numbers)? Some examples are given below:
- instance of P31
- image P18
- logo image P154
- GUI toolkit or framework P1414
- software version P348
- …
All those numbers!!!. Any newcomer may be alarmed by all these numbers. For this reason, there are several WikiProjects on Wikidata dedicated to different domains.
Wikidata WikiProjects
Wikidata WikiProjects help the contributors to identify the properties required to document a software application or information about other domains. For example, given below are some projects related to identifying the properties for software, operating systems, programming languages, etc.
- WikiProject Informatics/Software
- WikiProject Informatics/Operating System
- WikiProject Informatics/Programming Language
- …
A screenshot of one such WikiProject is given below. It shows the different properties required to describe a software application.
How to edit?
The next important question is how to add new information on Wikidata. The first option is to add it manually. The following screenshot shows how a new statement is added. A contributor wants to add the information: the official website
on a Wikidata page. They type the first few characters and Wikidata shows two properties and they can choose any one of them and add the value.
Wikidata also supports the addition of new statements using bots and tools. Three following tools are quite popular among the contributors for adding new information and even modifying or deleting existing information.
Quickstatements, for example, can be used to create new Wikidata pages. The following code can be used to create a Wikidata page for software.
CREATE
LAST Len "Software Name"
LAST Lfr "Nom du logiciel"
LAST Den "description"
LAST Dfr "la description"
LAST P31 Q?
HarvestTemplates can be used to import data from Wikipedia Templates (e.g., Infoboxes). All you need is the following information: the name of the template, property name on the template page (e.g., logo image), and the corresponding P-Number (e.g., P154 for logo). This tool will then import the data from Wikipedia pages to Wikidata.
PetScan is a very interesting tool that can be used to find Wikipedia articles matching categories and even be used to edit Wikidata items. Finally, contributors can even write bots making use of
Conclusion
Wikipedia or Wikidata? The answer is both. Both play an important role in the documentation of software. Because of its structured nature, not all information can be added on Wikidata, especially if the contributors want to compare two or more software. That being said, projects like Abstract Wikipedia shows how Wikidata and Wikipedia can play an important role together in the future. Integration between Wikipedia infoboxes and Wikidata shows another possible potential, where rapidly changing information like software versions can be fetched from Wikidata.
Wikidata can also be used to automatically generate articles. For example, the following line can be generated from a Wikidata page. Contributors may replace them with other software applications.
Akregator is a news aggregator application in KDE. It was developed in the QT GUI framework.
Finally, software applications are under constant evolution and no contributor can ensure up-to-date information on every Wikipedia page. Automation and collaboration together can play an important role in software documentation and preserving the digital heritage and linguistic diversity, ensuring that both current and legacy software solutions are known to the current and future generations in their native languages.
Q19279214! for reading this article.
There is nothing cryptic about the above sentence. It’s just to say Gracias! thank you! Merci! in Wikidata-style.
References
- Wikidata in Wikipedia, Mike Peel, Wikimania 2016
- Wikimedia Commons
- MediaWiki API
- Article Placeholder
- Quickstatements
- HarvestTemplates
- pywikibot
- Quickstatements
- HarvestTemplates
- PetScan
Acknowledgments
The author would like to thank Wikimedia Hackathons and Wikidata community members for several discussions around these topics.
This is an abridged version of the talk Wikidata: Exploring more visibility and coverage of KDE Applications given by me at Akademy 2017 in Almeria on 22 July 2017.
Originally published at https://johnsamuel.info.