|
|||
Previous: Run installation scripts
Fine tune interface features
Set up full text searching
We suggest 2 options for providing full-text searching: use MySQL's fulltext indexing or use Lucene to index and search the collection.
Use MySQL's fulltext index feature
The text table is used for keyword-based text search. To implement this, we make use of MySQL's fulltext index feature. By creating a fulltext index on the text column, MYSQL automatically builds an inverted index of all the words in that column.
A table to support this feature is created by the install_flamenco.py script. The script also populates the table with information from the text.tsv file. The script then runs the following MySQL command, adding a fulltext index.
ALTER TABLE text ADD FULLTEXT KEY text (text(200));
Note: Text is the name of the table as well as of the column used for the fulltext index. The "(200)" is necessary since in MySQL you can only index a prefix of a text column; you cannot index the entire column.
Use Lucene to index the collection
Indexing with Lucene is a bit trickier than with simply using MySQL's fulltext index but allows for more powerful searches. MM.Mysql is the driver used to do the indexing. To use Lucene, simply add a "luceneindex" variable to your collection specific module with the value set to the pathname of the directory containing the index for the collection. For example, our's looks like this:
luceneindex = '/projects/flamenco/lucene/arts'However, this is completely dependent on where you put your index. To index a database go to the lucene subdirectory of where you unzipped the Flamenco system files, which will look something like:
cd /projects/flamenco/luceneThen, type the following to index the database:
java -cp .:lucene-1.2.jar:mm.mysql-2.0.14-bin.jar Index <directory> <dbname>where <directory> is the directory where you want to put the index. To search a database, go to the lucene subdirectory and type the following:
java Search <directory>Activate personalization
Personalization will allow users to customize the appearance of their Flamenco browsing experience. The personalization tables will already be setup by the install_flamenco.py script. It is however, up to you whether to activate them or not. You can enable or disable this feature by setting a constant USER_PERSONALIZATION equal to 1 or 0 respectively, in the collection specific .py file.
Modify interface code to accommodate functionality specific to your dataset
It is quite likely that a new dataset may require some new functionality specific to that collection. For example, in the Fine Arts dataset, we needed to specify a URL from which the photos can be fetched. To implement this, we created a file arts.py which defined a Collection class containing all the new functionality. One aspect of the interface code you will definitely need to modify is the code that dictates how the results are displayed (how many on each row? how large will each item be [if showing images]? how many words will you list [if dealing with document abstracts])? and what metadata will be displayed with each item. Examples of collection python files can be found in the Resources section below.
Example. Images vs. text
Depending on the type of collection that is being used, display of items will be different. In our experience, one major distinction has been image based collections vs. text based ones. Different item mediums lend themselves to different display formats. These different formats should be accounted for in the collection specific .py file. As an example, consider the Fine Arts collection and the Tobacco collection. When displaying images in the middlegame, it made sense to display four per row. However, for a text based collection where title strings are being displayed, using this format would make less sense. It would make more sense to display only one item per row. Such considerations should be made in the collection specific .py file. For further details, see the arts.py and the tobacco.py files below.
Next: Resources
Questions? Comments? Contact Kevin Li (kevinli@sims.berkeley.edu)