Our very own IBM Watson - approximated using open source technology, replicated in non-supercomputer environment.
The goal is to build a system that can chew on few open semantic databases, Wikipedia and Sbrm and then be able to answer general questions like “List the biggest nuclear explosions in Russia” or “When do I pay brmlab membership fees?” or “What was the best time travel movie?”. The primary language in this phase will be English. Then, we can take things further - start supporting more advanced inference (|What resistance do I need in series with a random red LED on 5V?“), add some autonomous goal-based processing etc. Only the Strong AI is the limit!
We already have working software stack with reasonable performance, currently focusing on reviewing it for bugs and wrapping it up for a milestone scientific paper publication. Speed has not been focus so far. Accuracy on our 430 trivia question testset is a little above 30% as of Jan 2015.
We aim to do as little coding as possible, at least initially, instead focusing on integration of existing technologies. Most impressive initial results in the shortest time!
Live demo: http://live.ailao.eu/
Pre-print of the first paper on brmson: http://pasky.or.cz/dev/brmson/yodaqa-poster2015.pdf
(Original story on Pasky's blog: http://log.or.cz/?p=317)
A question-answering engine ”YodaQA“ (custom-made from the ground up) by Pasky is set up at his home server (AMD FX-8350, 24G RAM), together with enwiki fulltext index and dbpedia. It is connected to IRC and hangs out at #brmson, freenode.
All our code incl. documentation and setup instructions is open source lives in the github brmson organization.
In bold are our current choices that we are running with.
All sources listed here must be freely available.
In our architecture, we can probably try / mix multiple unstructured data architectures.
In the long run, question answering may not be the only capability of the system, but it is an excellent starting point and benchmark.
Off-the shelf solutions:
Custom (Watson-inspired) solution structure:
Notes about clustering: