Flamenco Installation Guide

 
Last Updated: June 15, 2005
Introduction to Flamenco
Obtain the necessary software and code
> Convert your dataset to the appropriate format
Prepare to run installation scripts
Run installation scripts
Fine tune interface features
Resources

Previous: Obtain the necessary software and code

 

Convert your dataset to the appropriate format

Determine facets and attributes

When converting a new dataset, you need to classify each feature in the dataset as either an attribute which will appear only in the metadata displayed alongside the item, or as a facet using which users are allowed to browse through the dataset.

For example, in the Fine Arts Museum collection, "Media," "Location," and "Date" are facets because different photos can share the same location, media type or date, and the user may want to search for all photographs in a certain medium (such as drawing or sculpture). On the other hand, the image record number is an attribute because few users will want to search for all photographs with a certain record number, though they are likely to want that information once they locate a useful photograph.

Note that "facets" are browseable item characteristics. Contrastingly, attributes are only shown after an image is found.

Create tab-delimited (tsv) files

You will need to create the following tab spaced text files:

(Note: Large tab spaced files can be easily manipulated using Excel. Also, samples of all files you're required to generate can be found in the Resources section.)

attrs.tsv and facets.tsv

attrs.tsv should be a list of the attributes you've decided on for your system. Each row in this file should represent a single attribute. For each of these rows, the first column should be the underlying system name for this attribute. The second column should be the display name you'd like users to see. A simplified attrs.tsv file for a collection of articles might look something like:

item PMID
title Title

This file is most easily generated by hand.

Similarly, facets.tsv should have a row for each of the facets you'd like your system to use. Like attrs.tsv, the first two columns of this file should be the underlying system name and the display name respectively. The third column is just a textual descriptor or comment of the facet; it is not used by the system but should be included for latter reference. A simplified facets.tsv file might look something like:

journal Journal Short name of the journal in which article appears
date_created Date Date the article was created

The only thing to note is that the identifiers in the first column of both files should be one word long only. In the attrs.tsv file, this identifier should be consistent with the column names of the items table. Remember, attrs or attributes are things that will only show up in the endgame view whereas the facet list descriptors will be used for navigation.

items.tsv

This file should contain one line for each item in your collection. For every row, values should exist for every attribute your system is using. (Note: Column headings are not included in the actual file). A collection with attribute fields RecordID, color, and date might look something like:

568945 blue 02-03-2001
938932 red 04-30-1999
934983 green 02-22-2000

The thing to note here is that the first column of every single row should be a unique identifier for the item.

[facetname]_hierarchy.tsv (for every facet you decide to have)

Each row in these files should represent a node in the facet hierarchy. The first column should be an identifier for that node, to be used in [facetname]_item_mapping.txt. Subsequent columns should be the values for that node, listed general to specific from left to right. A portion of this file for the location facet, location_hierarchy.txt might look something like:

1 United States California San Francisco
2 United States California Berkeley
3 United States Washington Seattle

The thing to note here is that the first column of every single row should be a unique identifier for that node.

[facetname]_item_mapping.tsv (for every facet you decide to have)

Each row in these files should represent the facet hierarchy mapping for that item's facet information. This is perhaps best described with an example. Consider once again, the location facet. If we are using the location_hierarchy.txt file from above, our location_item_mapping.txt file might look something like:

75635 1
434543 1
645654 3
534454 2

This would indicate that item 75625 has location values "United States->California->San Francisco." Likewise, item 645654 would have location values "United States->Washington->Seattle." For facets where items might have multiple values or "multi-valued facets," simply have multiple rows assigning values for that item.

fulltext.tsv

This file can be generated by hand. It will support fulltext searching and is only necessary if you plan to choose MYSQL fulltext searching later as opposed to lucene. For every item in your collection, provide any text associated with that item. The format of the file should be as follows.

001 all the text associated with item 001
002 all the text associated with item 002
003 all the text associated with item 003

sortkeys.tsv

This file will let the system know what attributes or facets you want to sort by. The first column should provide the display value for the sorting option. The second column should provide the name of the facet or attribute in the underlying system. That is, all the values of the second column should be found in either facets.tsv or attrs.tsv. A system filing publications might look something like this:

Journal journal
Date date_created
 

Next: Prepare to run installation scripts

 

Questions? Comments? Contact Kevin Li (kevinli@sims.berkeley.edu)