
Last week, the Government of Canada’s new “Language Portal of Canada” website popped up. It’s a site aimed at providing Canadians with “free access to the language tools that will enable them to use and understand both official languages more easily.”
What’s most interesting about this is that Termium Plus, the government’s gigantic translation dataset, is now freely accessible. Termium has long been a resource for public servants who couldn’t remember the official translations for program, agency or committee names, or who needed to know the other-language equivalents for technical terminology, or wanted to find out what obscure bureaucratic acronyms stood for.
How huge is this resource? Four million terms in English, French and, more recently, Spanish — that’s a gigantic pile of info. And it’s constantly being updated, with 50,000 modifications to the database annually.
Until last week, if you weren’t a GoC employee, you could only access Termium via an annual subscription that (if memory serves) was in the hundreds of dollars. This pretty much limited it’s use to professional translators and editors who could write it off as an expense. But now, anyone can get at it for free.
And if you’ve already paid your subscription for this year? You’ll get a refund:
Have you paid a subscription fee for TERMIUM Plus® this year?
If so, read this message.If you paid to subscribe to TERMIUM Plus® and are eligible for a refund, you will receive a cheque in the mail by December 31, 2009. You do not need to contact the Translation Bureau: we know how to get in touch with you.
Thank you for your loyalty throughout the years.
So it’s great that such a huge government resource is now freely accessible to everyone. But why not take it further?
Here’s some ideas for future development:
- offer up access to the database in machine-readable form so that third party developers can build interesting and possibly useful things from the dataset.
- allow public contributions to the data set, so that it can be expanded and improved in an efficient way.
- loosen licensing restrictions to maximize how far and wide the data can be reproduced and re-used.
(h/t David Eaves, who’s three laws of open data provide a convenient framework for thinking about how to develop government databases like Termium.)
Aside: Termium records are presented in a rather highly evolved interface, with URLs that are full of parameters. I wonder if the records can be spidered or indexed by search enginges?
Finally, some more info on Termium from around the web:
- The press release, from October 8th.
- Blog post from technical writer Diane Harms, pointing out that this also means that The Canadian Style is now freely available in searchable form.
- Quickie story from ITBusiness.ca. Dunno about that $1.1B dollar figure, seems more than a tad high — I’ll bet that amount is for some larger pot of money out of which came the funding for this project.
- Item on Digg.com from earlier this morning.