At Waterstons we like to sharpen our creative thinking and test out new technology to find out how it can solve problems and transform how we work. Our team decided to see if we could use technology to make finding relevant research materials in libraries more efficient. For those who have recently spent time trying to find resources in a library you’ll understand the frustration that comes from trying to use library search terminals. Search terminals are touch screen computers, which allows readers to search the library’s resources, giving them a list and a reference to each book.
The terminals I’ve come across are difficult to use due to an awkward touch screen and keyboard layout, but they also require very specific words to be returned, otherwise you can end up with an e-book, and who would want to read an e-book in a search terminal? To cut a long story short– they‘re just not user friendly! We thought it would be a great idea to develop an application that allows students to search for books in a different way – a fully automated way!
Using Microsoft Cognitive Services we utilised the Computer Vision API which takes an image from a camera and returns what text is on it using the Optical Character Recognition (OCR) which I am sure many of us have used before when scanning an image and turning it into PDF! Then, simply for demo purposes, we passed the data received to another API. First we tried the Academic Knowledge API as part of the cognitive services offering, but disappointingly that only returned papers and journals on the subject searched and that was very limiting.
We decided that using Google Books as the API was the best option. Google has been growing their books offering and it was relatively straightforward to implement, allowing us to get many similar books on the subject area by simply putting some text into a search box. If a student has a similar book they can just present it into the system and it will return similar resources using text recognition. In an ideal environment the Google Books results can be replaced by hooking it up to the internal library system, meaning that searching for resources can now be more interactive and easier by simply waving a book across a camera.
There are other potential uses for the technology we investigated. A simple use would be to automate many basic data entry tasks. The API was very good at recognising text, and therefore could be used to pre-populate forms, only requiring a user to correct any issues from the data entry, potentially providing huge efficiency gains. Similarly, being able to automatically read a standard label and pull in the text to a high degree of accuracy would allow luggage labels in an airport, or perhaps pallets in a warehouse to be identified automatically, quickly and with minimal cost.