Flamenco Installation Guide

 
Last Updated: June 15, 2005
Introduction to Flamenco
Obtain the necessary software and code
Convert your dataset to the appropriate format
Prepare to run installation scripts
Run installation scripts
> Fine tune interface features
Resources

Previous: Run installation scripts

 

Fine tune interface features

Set up full text searching

We suggest 2 options for providing full-text searching: use MySQL's fulltext indexing or use Lucene to index and search the collection.

Use MySQL's fulltext index feature

The text table is used for keyword-based text search. To implement this, we make use of MySQL's fulltext index feature. By creating a fulltext index on the text column, MYSQL automatically builds an inverted index of all the words in that column.

A table to support this feature is created by the install_flamenco.py script. The script also populates the table with information from the text.tsv file. The script then runs the following MySQL command, adding a fulltext index.

ALTER TABLE text ADD FULLTEXT KEY text (text(200));

Note: Text is the name of the table as well as of the column used for the fulltext index. The "(200)" is necessary since in MySQL you can only index a prefix of a text column; you cannot index the entire column.

Use Lucene to index the collection

Indexing with Lucene is a bit trickier than with simply using MySQL's fulltext index but allows for more powerful searches. MM.Mysql is the driver used to do the indexing. To use Lucene, simply add a "luceneindex" variable to your collection specific module with the value set to the pathname of the directory containing the index for the collection. For example, our's looks like this:

luceneindex = '/projects/flamenco/lucene/arts'

However, this is completely dependent on where you put your index. To index a database go to the lucene subdirectory of where you unzipped the Flamenco system files, which will look something like:

cd /projects/flamenco/lucene

Then, type the following to index the database:

java -cp .:lucene-1.2.jar:mm.mysql-2.0.14-bin.jar Index <directory> <dbname>

where <directory> is the directory where you want to put the index. To search a database, go to the lucene subdirectory and type the following:

java Search <directory>

Activate personalization

Personalization will allow users to customize the appearance of their Flamenco browsing experience.  The personalization tables will already be setup by the install_flamenco.py script. It is however, up to you whether to activate them or not. You can enable or disable this feature by setting a constant USER_PERSONALIZATION equal to 1 or 0 respectively, in the collection specific .py file.

Modify interface code to accommodate functionality specific to your dataset

It is quite likely that a new dataset may require some new functionality specific to that collection. For example, in the Fine Arts dataset, we needed to specify a URL from which the photos can be fetched. To implement this, we created a file arts.py which defined a Collection class containing all the new functionality. One aspect of the interface code you will definitely need to modify is the code that dictates how the results are displayed (how many on each row? how large will each item be [if showing images]? how many words will you list [if dealing with document abstracts])? and what metadata will be displayed with each item. Examples of collection python files can be found in the Resources section below.

Example. Images vs. text

Depending on the type of collection that is being used, display of items will be different. In our experience, one major distinction has been image based collections vs. text based ones. Different item mediums lend themselves to different display formats. These different formats should be accounted for in the collection specific .py file. As an example, consider the Fine Arts collection and the Tobacco collection. When displaying images in the middlegame, it made sense to display four per row. However, for a text based collection where title strings are being displayed, using this format would make less sense. It would make more sense to display only one item per row. Such considerations should be made in the collection specific .py file. For further details, see the arts.py and the tobacco.py files below.

Next: Resources

 

Questions? Comments? Contact Kevin Li (kevinli@sims.berkeley.edu)