Wikidata:Tools/Paulina/Software Documentation
This page details Paulina's software architecture, as well as useful information to contribute to the project in a technical way, and steps to install, run and customize the application.
Paulina is a web application developed in Python using the Flask web framework. Data is obtained through the different APIs available: the MediaWiki Action API, the Wikibase REST API and the Wikidata Query Service.
Requirements
[edit]- Python 3.11 or later.
- Python third-party modules: flask, Flask-Babel, python-dateutil, requests.
- A basic Flask WSGI webservice. Learn how to set it up in Toolforge following steps 1 and 2 of the Wikitech's Flask tool step-by-step guide. In other hosting environments the steps are different. For example, the steps for setting up a Flask application in PythonAnywhere are these.
Application files
[edit]This is the Paulina software repository file tree:
paulina ├── .gitignore ├── app.py ├── babel.cfg ├── config-sample.py ├── languages.py ├── LICENSE ├── messages.pot ├── pdclasses.py ├── README.md ├── requirements.txt ├── search.py ├── squeries.py ├── static │ ├── css │ │ └── main.css │ ├── fonts │ │ └── ... │ └── images │ └── ... ├── templates │ ├── 404.html │ ├── about.html │ ├── author.html │ ├── countries.html │ ├── country.html │ ├── home.html │ ├── layout.html │ ├── no-results.html │ ├── results.html │ ├── term.html │ ├── work.html │ └── works-list.html └── translations ├── bn │ └── LC_MESSAGES │ ├── messages.mo │ └── messages.po └── ...
app.py
[edit]app.py is the root file of the software.
This file:
- Initializes the main application.
- Initializes Babel, the application responsible for internationalization (i18n).
- Loads the config file.
app.py also defines all the routes, with the HTML templates that will be displayed for each URL and the data that will populate those templates.
It also includes a small function that allows you to handle special types of data.
search.py
[edit]search.py includes 3 functions. The two main ones are:
- search_author()
- search_work()
These two functions receive the user's input and connect to the Wikidata API to perform a search for Wikidata items with CirrusSearch (more about this).
search_author()
includes the filter haswbstatement:P31=Q5|P31=Q21070568
(for human beings) in the CirrusSearch search itself, while search_work()
takes the results from CirrusSearch and filters them with a query that leaves only the items that have creator (P170), author (P50), director (P57) and/or other similar properties.
The search_filters()
function is a helper function that is called from the two previous functions to include the filters chosen by the users in the search.
pdclasses.py
[edit]pdclasses.py contains the different types of objects (classes) that Paulina has: Author, Work, Country and Term. Each type of object has its own attributes that describe the entity and are shown in the page dedicated to that entity. The classes also have methods that perform actions with the entities (for example, calculating the age of an author, consulting his works and inferring whether they are in the public domain).
Objects are created from the Q ID of a Wikidata item. The function retrieve_item()
, used by all classes, connects to the Wikibase REST API and requests the information of the Wikidata item item. Depending on the case, it requests the complete information of the item or only the labels. Some attributes and methods require extra information, not contained in the individual Wikidata item, so they run queries.
Other functions in this module, build_field()
and build_date()
, allow handling special data types, such as unspecified values and dates, and also identify the preferred value of a property if it exists.
squeries.py
[edit]squeries.py is imported by others that use Wikidata queries. This module contains the templates of all the SPARQL queries in the application:
squeries.py also includes a function, retrieve_query()
, that connects to the Wikidata Query Service and requests the data in JSON format. This function has one required argument, which is the name of the query template, and accepts one or more keyword arguments that correspond to the variables that customize the template.
languages.py
[edit]languages.py deals with language-related issues. First, it includes a list of supported languages taken from Wikidata ((only two-character codes). Paulina can potentially take and display data from Wikidata in all those languages.
The module also includes a dictionary of already translated languages (see Translations), with their name and code. These selected languages are displayed in the language selector of the web application.
Finally, languages.py has a function that takes the language selected by the user (included in the "language" URL parameter) and stores it in a session cookie (see Creating config.py file) that has the sole purpose of making the language choice persistent throughout the session. If the user did not choose a language, the language is taken from the browser's Accept-Language HTTP header. If that header does not exist, the default language is English.
config-sample.py
[edit]config-sample.py is a template for creating config.py, which is the actual file used by the application for configuration purposes. See Creating config.py file.
babel.cfg
[edit]babel.cfg is the i18n Babel application configuration file.
messages.pot
[edit]messages.pot is the Babel-generated template for translations.
requirements.txt
[edit]requirements.txt contains the list of required third-party modules listed in Requirements. They can be installed by running from the root folder of the application:
pip install -r requirements.txt
.gitignore
[edit].gitignore is a file used to specify untracked files when working with the Git version control system.
static
[edit]The folder called /static/ includes three subfolders:
- css: contains the main.css file, which includes a set of styles that modify Bootstrap styles, the CSS framework used by Paulina.
- fonts: Paulina uses the Lato typeface. This folder includes all the variants used of this typeface.
- images: contains all the images used in the application, including several versions of the logo and icons.
templates
[edit]The folder called /templates/ contains all the HTML templates used in the application. These templates are used to create web pages dynamically with the Jinja web templating engine included in Flask. For example, there is an author.html template that is used to create the information page for any author.
There is also a basic template, called layout.html, that includes the <head> metadata (the favicon, Open Graph meta tags, the link to Bootstrap CDN, among other things), the header, footer, and other basic aspects present in all pages. This layout template is inherited by all other templates.
Jinja allows, among other things, to include information stored in variables in the html file and to execute loops to create html code dinamically. This technique is used, for example, in the results.html template to display an indefinite number of search results, and in the works-list.html template to display the list of works by an author.
Learn more about Jinja in the Jinja Template Designer Documentation.
translations
[edit]The folder called /translations/ contains subfolders with the translations into different languages. The structure of these subfolders is a i18n standard and is automatically generated by the Babel application. For each language there is a messages.po file containing the translated strings in plain text, and a messages.mo file containing the same strings compiled into a binary file.
Learn more about how to use Babel with Flask in the chapter i18n and L10n of Miguel Grinberg's excellent Flask tutorial.
Translations are done collaboratively on Weblate (see Translations).
Creating config.py file
[edit]You must copy the config-sample.py
file to config.py
. In this file you must uncomment the first line and enter a secret key, e.g.:
SECRET_KEY = "YourSecretKeyHere"
The secret key is for the session cookie. The session cookie is necessary so that the language choice is persistent throughout the session.
Paulina Local
[edit]Paulina Local is a fork of the main Paulina repository. It contains adaptations with respect to the main repository to facilitate the implementation of local or thematic versions of the tool.
The most important difference with respect to the main repository is the addition of a file in the root folder of the app, local_settings.json, which contains the custom information. In the main version of Paulina, the logo, the title of the site and the texts of the home page, among other data, are hardcoded in the html templates. In Paulina Local, however, this data is taken from local_settings.json.
local_settings.json can be modified by hand from the version available in the Paulina Local repository. Another possibility is to delete the file when installing the application on the server. If the file is not on the server, the first time we enter the site the interface will redirect us to a web form where we can fill in this data, after which the file will be created automatically.
Another important issue in local or thematic versions is the need to adapt the main search results. For example, the first implementation of Paulina Local, called Dominio Público Uruguay (available at https://dominiopublico.uy), is a national portal focused on Uruguay's cultural heritage. Thus, the main search for authors and works must be filtered to show only results for Uruguayan authors and works.
This is achieved by modifying the search.py file, in particular the search_author()
and search_work()
functions.
For example, in the main Paulina application, search_author()
contains this line with a filter for human beings:
human_being_filter = "%20haswbstatement:P31=Q5|P31=Q21070568"
In the Dominio Público Uruguay repository, that line was changed. In addition to the human being filter, now it contains two more filters: 1) a filter for items that have IDs of catalogs of authors from Uruguay, and 2) a filter for Uruguayan nationality:
predefined_filters = ("%20haswbstatement:P31=Q5|P31=Q21070568" # Is a human being "%20haswbstatement:P12595|P2558|P6156" # Has an ID from BNU, autores.uy or MNAV "%20haswbstatement:P27=Q77") # Has Uruguayan nationality
As for the search_works()
function, at certain point this function calls the query query_for_works_from_search_results
, available in the squeries.py module. Compared to the main Paulina application, the Dominio Público Uruguay repository has an extra line added to that query:
?author wdt:P27 wd:Q77 .
meaning "author of the work has country of citizenship (P27) -> Uruguay (Q77)".
These or other changes can be made to filter by default the main search results of the Paulina Local custom implementation by country, region, type of work, language, etc.
Contribute
[edit]Issues: Request a feature, report a bug or suggest other tasks on the Paulina Phabricator project.
Merge requests: Merge requests can be submitted to Paulina's repository on Wikimedia's GitLab.
Translations
[edit]You can help translate Paulina on Weblate.
License
[edit]This software is under a GNU Affero General Public License Version 3.
Code repositories
[edit]- Paulina (main application)
- Paulina Local (fork of the main app for custom implementations)