project:brmson:start
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
project:brmson [2015/04/28 18:35] – pasky | project:brmson:start [2016/11/28 02:42] (current) – [(Historical) Knowledge Base] ruza | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | | ||
+ | {{template>: | ||
+ | name=Brmson| | ||
+ | image=brmson.png? | ||
+ | sw=-| | ||
+ | hw=-| | ||
+ | founder=[[user: | ||
+ | interested=| | ||
+ | status=active | ||
+ | }} | ||
+ | |||
+ | ~~META: | ||
+ | status = active | ||
+ | & | ||
+ | ~~ | ||
+ | |||
+ | Our very own [[wp>IBM Watson]] - approximated using open source technology, replicated in non-supercomputer environment. | ||
+ | |||
+ | The goal is to build a system that can chew on few open semantic databases, Wikipedia and Sbrm and then be able to answer general questions like "List the biggest nuclear explosions in Russia" | ||
+ | Then, we can take things further - start supporting more advanced inference (|What resistance do I need in series with a random red LED on 5V?"), add some autonomous goal-based processing etc. Only the Strong AI is the limit! | ||
+ | |||
+ | We already **have working software stack** with reasonable performance, | ||
+ | |||
+ | We aim to do as //little// coding as possible, at least initially, instead focusing on integration of existing technologies. Most impressive initial results in the shortest time! :-) | ||
+ | |||
+ | **Homepage: [[http:// | ||
+ | |||
+ | **Live demo: [[http:// | ||
+ | |||
+ | Pre-print of the first paper on brmson: [[http:// | ||
+ | |||
+ | (Original story on Pasky' | ||
+ | |||
+ | ===== Status and Planning ===== | ||
+ | |||
+ | A question-answering engine " | ||
+ | |||
+ | All our code incl. documentation and setup instructions is **open source** lives in the [[https:// | ||
+ | |||
+ | ===== (Historical) Knowledge Base ===== | ||
+ | |||
+ | Starting points: | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | * [[https:// | ||
+ | * [[http:// | ||
+ | * [[http:// | ||
+ | |||
+ | In bold are our current choices that we are running with. | ||
+ | |||
+ | ==== Data Sources ==== | ||
+ | |||
+ | All sources listed here must be freely available. | ||
+ | |||
+ | * Structured: | ||
+ | * WordNet | ||
+ | * YAGO | ||
+ | * DBPedia | ||
+ | * Unstructured: | ||
+ | * **Wikipedia**, | ||
+ | * TVTropes, Urban Dictionary | ||
+ | * News articles (theregister, | ||
+ | * Sbrm, laws, patents, ... | ||
+ | * IRC logs, Bitcoin forums, transcripts (lectures, Tetra), ... | ||
+ | |||
+ | ==== Unstructured Data Sources Interfaces ==== | ||
+ | |||
+ | * **[[wp> | ||
+ | * Industry standard, used by the IBM Watson team as well | ||
+ | * Extremely unstructured data: [[wp> | ||
+ | * Both unstructured data interface (with appropriate plugins) and a general processing framework | ||
+ | * [[wp> | ||
+ | * A popular(?) alternative | ||
+ | * [[wp> | ||
+ | * Python-based, | ||
+ | |||
+ | In our architecture, | ||
+ | |||
+ | ==== Question Answering Framework ==== | ||
+ | |||
+ | In the long run, question answering may not be the only capability of the system, but it is an excellent starting point and benchmark. | ||
+ | |||
+ | Off-the shelf solutions: | ||
+ | * **OpenQA/ | ||
+ | * Opensource framework (on top of UIMA) that seems very close to actual IBM Watson tech; https:// | ||
+ | * Some inspiration may come from the old website? See e.g. https:// | ||
+ | * [[http:// | ||
+ | * Full-fledged OAQA pipeline instances publicly available: | ||
+ | * We have rolled our own, **[[https:// | ||
+ | * [[https:// | ||
+ | * [[https:// | ||
+ | * UIMA-based http:// | ||
+ | * Small gradstudent project, but it may be a good prototyping base; got no response from affiliated people | ||
+ | * OpenEphyra http:// | ||
+ | * Seems to be a kind of obsolete, old-style solution? but ready-to-use; | ||
+ | * QA component of the " | ||
+ | * Simple, a bit hackish, tightly integrated with solr | ||
+ | |||
+ | Custom (Watson-inspired) solution structure: | ||
+ | * Parse the question in multiple independent ways, with assigned confidences | ||
+ | * Process the question in multiple independent ways (variety of sources etc.), with assigned confidences | ||
+ | * Generating and verifying hypothetic answers | ||
+ | * Pick the highest-confidence answer(s) | ||
+ | * This can be heavily parallelized in the future. Confidences may be assigned using internal solvers' | ||
+ | |||
+ | ==== Scaling Up ==== | ||
+ | |||
+ | Notes about clustering: | ||
+ | * Embarassingly parallel | ||
+ | * Common: UIMA-AS + Hadoop | ||
+ | * https:// |
project/brmson/start.txt · Last modified: 2016/11/28 02:42 by ruza