The JSTOR search engine provides matches to search terms with both natural language and fielded searches. Regardless of the type of catalogued item, JSTOR provides matches to text and images using words provided by users. When reviewing your collection’s metadata, this implies:
- Generally, more text is better.
- A unique title for each item will increase discoverability.
- A unique description of each item will increase discoverability.
- Information about the item itself will increase the number of ways your item is useful to users for different tasks in their research.
JSTOR faces a global audience. When selecting the title of your collection, or items, consider that people with different worldviews will discover your collection.
For example, a collection titled The 1980s is very broad. By naming your collection so broadly, you may be implying the content contains important events, decisions, actions taken during that period, especially if your collection is actually much more focused. Consider how someone from South Africa would view The 1980s - they may expect to see something about the anti-Apartheid movement.
Another example is the words Civil War. For an American audience, this would most likely be taken as the American Civil War (1861-1865); others around the world could assume that it is the Russian Civil War, or one of the civil wars on the African continent. Label for the global audience. If your collection that you originally titled The 1980s is in fact a selection of school yearbooks from that decade, consider a more specific title such as University of Michigan Yearbooks from the 1980s, which is much more descriptive and does not erase other major events and trends of this period.
It is also important to be cognizant of the harmful biases that may be depicted in your items and provide context in the description of your collection as well as in individual items. As a library resource, we understand that we must provide access to information that is offensive or considered harmful in modern context. If an item depicts racial bias, gender bias, bias against a specific sexual orientation, ethnic bias, nationalistic bias, religious biases, biases against people with disabilities, or depictions of violence, we kindly ask that you rationalize the inclusion of these items in your collection in the item’s description field so that users have a reference point to discover more information. Ideally, the supplementary information that you provide will enable users to think critically about what is depicted in your items.
While these are general best practices for your metadata, there are positive search related side effects to these suggestions. Your items will have additional contextual information, so they will match more keywords, thereby matching and ranking in ways more helpful to the users on JSTOR.
A common search tactic used by users on JSTOR is to filter results by their date. This enables users to focus their search to items created within a specific range of years. While an item date is not required to publish to JSTOR, a date is required for those items to match searches that include date range criteria.
JSTOR search users can also sort their results by date in ascending or descending order. Items that do not include a date are sorted to the bottom when either ascending or descending order is chosen by a user. This functionality is enabled by the Precise Date field on your items.
Some items do not have a precise date. For example, an Ancient Greek statue may only be known to have been created within some range of decades two thousand years ago. JSTOR Forum provides a means to catalogue your items with a date range using Temporal Coverage or Earliest Date / Latest Date. We recommend cataloguing your items with a Temporal Coverage or Earliest Date / Latest Date in JSTOR Forum if applicable.
See also Field-specific Mapping Recommendations.
For images and other non-textual item types
JSTOR search takes keywords provided by the user and matches them in the item metadata. In the case of non-text content, the metadata is primarily what makes your item match keywords provided by users. You can find information below about what metadata fields are used in different search experiences on JSTOR.
Default and Available Search Fields for JSTOR Collections Content
|Search index field name||Description||Search fields for Basic, Advanced, and Collection-level searches for image and primary source collections||
Available to search the specified field only in any search box with a search that contains exact field name and value(s) formatted as shown
*see example below
|au||authors, creators, other contributors to a work||✅||✅|
|cc_container_title_tokenized||container title for searching||✅||✅|
|cc_custom_fields||searchable field for custom fields||✅||✅|
|cc_image_view_description||community collection image view description||✅||✅|
|cc_locations||community collection locations||✅||✅|
|cc_portal_title_tokenized||portal title for searching||✅||✅|
|doi||JSTOR identifier that is formatted like a DOI but is not a registered DOI, the suffix of which is in the JSTOR stable URL. Searching this field does NOT search an item's registered DOI as found in the item-level metadata record labeled "DOI". (See "identifiers" below)||✅||✅|
|ocr||ocr data or text of digitized content||✅||✅|
|identifiers||Contains the data for all Shared Collection identifiers: Identifier, Accession Number, DOI, eISBN, eISSN, ISBN, ISSN, OCLC Number, and Local||✅||✅|
*Example: "ti:(mona lisa)" (searches title field) "identifiers:(LML_MS-050-00198)" (searches different identifiers)
While you can enable OCR for your project in JSTOR Forum, please ensure that the OCR data produced is accurate. If you process any other item type than documents using OCR, you may end up with inaccurate data, leading to your items matching for random, irrelevant, or possibly harmful or inappropriate queries. To avoid doing so, make sure you only process documents with OCR and verify the validity of the OCR output.
JSTOR recommends applying a Resource Type from our controlled list of values (available as a list in Forum). Each Resource Type then rolls up to a broader Content Type (Books, Documents, Images, and Serials). Books, Serials and Documents are for textual content and labeled in the search index as 'text'. If a JSTOR Resource Type is not selected, items will publish to JSTOR by default as "Images". These Content Types are used as facets for filtering search results.
The Resource Type used on image content should be one that rolls up to the Content Type “Images” and not one of the textual Content Types because text and images are differentiated in the search index and treated differently in search results and facets. If a JSTOR Resource Type is not selected, items will publish to JSTOR by default as Content Type "Images", so a Resource Type is not absolutely necessary on image content items.
For items that are images depicting text documents, we recommend that you enable OCR for your project. Any metadata included with your item will be included in the search index as well. Search queries will match both the metadata and OCR data.
If your text item is in the form of a PDF, at this time, we do not extract the text or OCR these documents. See below for information about future improvements.
JSTOR recommends applying a Resource Type from our controlled list of values (available as a list in Forum). Each Resource Type then rolls up to a broader Content Types (Books, Documents, Images, and Serials). Books, Serials and Documents are for textual content and labeled in the search index as 'text'. If a JSTOR Resource Type is not selected, items will publish to JSTOR by default as "Images". These Content Types are used as facets for filtering search results.
The Resource Type used on textual content should be one that rolls up to one of the text Content Types (“Books”, “Serials”, or “Documents”) and not “Images” because text and images are differentiated in the search index and treated differently in search results and facets.
If a JSTOR Resource Type is not selected, items will publish to JSTOR by default as Content Type "Images", so a textual Resource Type and Content Type is especially important for text content. If text content is mislabeled as Content Type “Images”, the items will be displayed as images in search results and will not be included in faceted filtering for textual content.