The executive order to provide open data was signed by the Russian President in 2012. Six years later, in 2018, an investigation by Digital Russia Studies research group at the University of Helsinki was launched to map the situation. How much and what kind of data had by then been made available in the government data portals? The secondary aim of the project was to describe that information, and support researchers in identifying the best sources and most usable data sets.
The project was completed by research assistant Ilona Repponen, an MA student of Translation Studies at the University of Helsinki.
— We chose 75 Russian executive organs for this study, Repponen explains. They included federal agencies and services, ministries and funds listed in the government structure. We wanted to study the character of open data portals and describe the contents using a linked ontology.
The data was gathered from various portals and in different formats. It was then carefully saved on a spreadsheets and analysed in order to create an ontology, that would cover all the sources of the study, and be useful in analysing future datasets as well.
Structuring the data reveals what's there — and saves time
Structuring the data is vital for its usability. For this, Ilona Repponen came up with useful metadata, such as information on type of organisation, open data availability, number of data sets, file formats and concepts describing data sets. Researchers can now explore this preliminary information on data sets instead of wasting their time in browsing the numerous open government data portals in a desperate attempt to find something useful.
Because desperate one might indeed become. At the first glance there seems to be plenty of data sets available — approximately 49 data sets were published per portal, both quantitative and qualitative — but what kind of data exactly, and how could it be reached?
— I quickly found out that the most common data sets were related to administration. A tendency in data sets among Russian governmental bodies is to release data sets on contact information, such as the names and work addresses of their staff. This means they only fulfill the minimum requirements of the 2012 order, Repponen notes.
This kind of data, is not, of course, very interesting for research purposes. On the contrary, the amount of such trivial material might be enough to curb the enthusiasm of a budding digital humanist. And this is exactly where the linked ontology comes in handy.
— I downloaded the data sets, analysed their contents, and created an ontology of 37 concepts based on existing concept ontologies such as FINTO (Finnish thesaurus and ontology service run by the National Library), and other open government data portal ontologies, says Repponen. Some concepts were independent, and some where designed to build a hierarchy.
The concepts included the above discussed Administration but also Agriculture, fisheries and forestry; Arts, culture and heritage; Business; Construction; Crime and justice; Documentation; Economy and finances; Education; Energy; Environment; Events, and so on. The actual data sets might include registers of official documents and statistics on different topics. For example, the Ministry of Culture provides such data sets as catalogues of patriotic music, and film distribution information, while The Federal Agency for Rail Transport has published data on average salary of its staff. There are, thus, many datasets that could be of use for researchers of humanities and social sciences, and many organisations have taken pains to obey not only the letter but also the spirit of the president’s order.
User friendly formats
Majority of the data is published in a simple, user friendly manner.
— What delighted me, was that the most common format for the data was CSV, comma separated value, notes Repponen. That’s a format that can be worked on using simple office software. When the data is available in formats such as JSON, that already puts the treshold up for many scholars.
Some public actors have even gone so far as to publish a downloadable list of available data sets on their site, and some provide a search engine on their portals. It is as yet too early to say, where the situation will go from here. Will new, and more useful datasets be made open? Will the ones now published be maintained and developed further? Data literacy and willingness of researchers to make use of the data provided is, in any case, something that the DRS network will aim to improve also in the future.
The linked ontology created by Ilona Repponen has been published at the Linked Data Finland website, along with the links to the resources and metadata.