A Wikidata project to make life sciences research data available across languages is set to boost research sharing in the international scientific community and ensure greater data consistency across Wikipedia pages in different languages.
WikiProject Molecular Biology is intended to be a centralised resource for data on everything from genetics to pharmaceuticals.
“Our aim is making Wikidata the central hub of life sciences data on genes, proteins, diseases and drugs and the relationships between them,” said Andra Waagmeester, semantic data expert and member of the WikiProject.
“Our approach is that we take a lot of resources on these topics and import them into Wikidata, and not only import them, but also maintain them through regular updates. This has two effects: research data that is closed in different data silos can now be used to populate Wikipedia articles, but also since Wikidata has a single API, scientists can start reproducing and sharing the information in Wikidata themselves.
“So you can see not only provides spreading out of scientific knowledge and providing proper scrutiny for scientists’ findings, but also it provides a strong infrastructure for scientists to share their own results.”
Language barriers can be a significant problem in scientific research, with discoveries sometimes failing to be disseminated and thus needing to be rediscovered in other countries.
An example given by Waagmeester is Arsenic trioxide, a chemical compound found in 2013 to be an effective alternative to supplemental chemotherapy in treating a certain type of leukemia.
“What’s intriguing about this story is that the paper was published in 2013 – the curative effect of the trioxide was already known about in China for decades,” he added.
“Only, it didn’t reach the English-speaking world due to two simple facts: the scientists who noticed it didn’t speak English, and the findings were published in a scientific journal that was even obscure for most Chinese readers.”
Had the research been more widely known, this alternative treatment could have been life-saving.
“It showed that having data closed can actually be bad for the treatment of other people. So now we argue that Wikipedia could solve that,” added Waagmeester.
There are also inconsistencies across the same Wikipedia pages in different languages about particular diseases and treatments, which the project hopes to solve. However, this problem is not unique to the life sciences – the population of Aruba, for example, differs depending on the language you view it in, an issue that the wider Wikidata project is set to tackle.
“The Wikimedia Foundation came up with another project that just had its first birthday last week, which is called Wikidata. Wikidata is a linked database that provides data that it is editable by humans and machines, as it is with Wikipedia articles. It’s actually the Wikipedia model to data,” said Waagmeester.
“Now we can actually start writing Wikipedia articles where both the Chinese audience and the English-speaking audience could actually mine data that’s available in Wikidata, and potentially breaching the language barrier on science.”