Using semantic web technologies to compare censuses

1910 norge nederland

In this project we explored how semantic web technologies can be used to link together open data from various sources in hopes of providing some new insights.

As a part of the Semantic Web course at the Vrije Universiteit in Amsterdam, Robin van der Markt and I worked on this project during one busy month. We wanted to explore similarities and differences between the Netherlands and Norway based on censuses from the 19th and 20th century. Both countries have made censuses from this time period digitally available, but in different format.

Based on the work done by the CEDAR project, many of the Dutch censuses are available in RDF. As this was not the case for the Norwegian data, we decided to convert the Norwegian datasets to RDF. Even though the Norwegian census from 1910 is available in JSON format online today, this did not fit our purpose, as there is a constraint on the number of results for each query. Thankfully the same data is available through the NAPP project.

With limited time we decided to only focus on the occupational information which lies in the censuses. Utilizing the information available from the HISCO website we were able to construct an “occupational link” between the two censuses which obviously is very different in format. This link was based on a dataset of more than 1600 different occupations from the HISCO website. But as the Dutch and Norwegian datasets was different from each other, we ended up with a mere 146 occupations which we could find in both datasets.

Nevertheless, this was more than enough to show how semantic web technologies such as RDF(S) and Sparql easily can connect data from different sources, as long as they are in the same format. Setting up our own triple store using Fuseki which is running on a private VPN, we could make Sparql queries to the different data. We applied HTML5 and JavaScript to visualize the data in the form of a heat map, showing differences and similarities between the Netherlands and Norway.

The application is currently available online: http://chefdev.nl/semanticweb/. Here you’ll find similarities between the countries, such as clusters of miners in more rural areas and that barbers mainly lived in the big cities. But also differences, where the Netherlands employed a higher number of people in the railways and that Norway (somewhat counter intuitive) had more people working as bicycle makers than the Netherlands. If you want to read more about our project in detail, the project report is available for download here: Final report SW 2013

If you are new to the semantic web there are various sites which we would recommend to visit, such as:

Authors:

Kategorier: cenus, Folketellinga 1910, fuseki, Open data, owl, rdf, rdfa, rdfs, Semantisk web, sparql

Abonner

Subscribe to our RSS feed and social profiles to receive updates.

Ingen kommentarer så langt.

Legg igjen en kommentar

Fyll inn i feltene under, eller klikk på et ikon for å logge inn:

WordPress.com-logo

Du kommenterer med bruk av din WordPress.com konto. Logg ut / Endre )

Twitter picture

Du kommenterer med bruk av din Twitter konto. Logg ut / Endre )

Facebookbilde

Du kommenterer med bruk av din Facebook konto. Logg ut / Endre )

Google+ photo

Du kommenterer med bruk av din Google+ konto. Logg ut / Endre )

Kobler til %s