| |
|
|
 |
 |
 |
 |
 |
 |
| |
|
|
 |
 |
 |
 |
Posted By: Arnaud Saval
Date: 2012-04-19 14:08
Summary: First WebLab Bundle Released
This bundle can crawl a local folder (toIndex) in order to analyze text based documents, index them to finally offer access to them through a portal.
The processing capabilities are limited (only default rules for the named-entity extraction engine are used) but it allows to have a complete processing chain and ease integration and test of new components either on processing chain or on user interface.
This bundle is regularly released (http://weblab-project.org/index.php?title=Download) and build nightly with latest services/portlets (see http://bamboo.ow2.org/browse/WEBLAB-BUNDLE).
This bundle presents an information retrieval system based on the complete WebLab architecture.
It is mainly composed of the following WebLab services:
- an homemade folder crawler able to listen and crawl the content of a given folder (http://weblab-project.org/index.php?title=Folder_Listener),
- a normaliser that will extract the text content of various files (ms-office, pdf, rtf, etc.) based on Apache Tika (http://weblab-project.org/index.php?title=Normaliser_using_Tika),
- a named entities extraction service that detects words in the document and annotate it in documents, based on gazetteer (http://weblab-project.org/index.php?title=Simple_Gazetteer),
- an indexer that will index the text content and make it searchable based on Apache SOLR (http://weblab-project.org/index.php?title=Solr_Indexer/Searcher_WebLab_Web_Service).
In addition to these services, we can found some technical services.
The demo also contains a WebLab chain,
- that chains the previously mentioned services;
and four WebLab portlets:
- a launchCrawl portlet that will launch and monitor the processing of documents with the chain,
- a search portlet that will launch query on the SOLR searcher,
- a result portlet that display the results of the query,
- a annotated document portlet that display the document annotated with the annotation added by the named entities extraction service.
|
|
Start New Thread | Admin
| Topic |
Topic Starter |
Replies |
Last Post | |
|
 |
 |
 |
 |
|
 |
 |
 |
 |
Copyright © 1999-2008, OW2 Consortium | contact | webmaster.