In this article:
The collection harvesting service may be added to other infrastructure services such as shared collections, preserved collections, or JSTOR Forum as a way for ITHAKA to bring your media and metadata into our system on your behalf.
The collection harvesting service includes:
- An initial feasibility check to confirm we're able to harvest your content.
- Syndication of your media and metadata into our system (at your direction) using an automated process. See Harvesting Options.
- Your content will be brought in exactly as it's made available. We do not do any review or editing of the content or manually contribute any editorial input during this process.
- If sharing the content on JSTOR, we'll build landing pages and publish the collections to the JSTOR platform. We'll also create containers and/or compilations if directed by you at the beginning of the process.
- If you're unable to provide any of the landing page materials up front (logo, banner image, and description) or if you wish to make changes to your landing pages in the future, you may do so directly through JSTOR Forum or the collection loader tool.
- Please note any collections containing containers/compilations will appear as read-only in the collection loader tool.
- If preserving the content, we'll send the content to Portico for managed preservation.
- If using JSTOR Forum, we'll load the content into Forum, where it can be edited, cataloged, and published.
- Updates for existing collections are done upon request by emailing contributors@jstor.org.
Harvesting options
There are three options for harvesting your media and metadata.
Harvesting from a supported repository
We're usually able to harvest both media and metadata from these repositories, though a feasibility check is still required for confirmation. The list of repositories includes:
-
-
- CONTENTdm
- Digital Commons by bepress
- DSpace
- eScholarship
- Ex Libris
- Internet Archive
- Islandora
- LUNA
- Omeka
- Preservica
-
Requirements:
- The repository cannot be password protected or the contributor must provide ITHAKA with a valid username/password.
- ITHAKA’s IP must have access to the repository or we can provide our IP and the contributor will allow access.
- If the repository has an OAI-PMH endpoint, it must be turned on. We can provide instructions on how to turn on the OAI-PMH.
Harvesting from a repository with an OAI-PMH endpoint
We're able to harvest metadata from any repository with an OAI-PMH endpoint. In most cases we're also able to harvest the corresponding media. If we're unable to harvest the media, the contributor may send this to us via FTP (see Contributor provided via ITHAKA FTP).
Requirements:
- The repository cannot be password protected or the contributor must provide ITHAKA with a valid username/password.
- ITHAKA’s IP must have access to the repository or we can provide our IP and the contributor will allow access.
- The OAI-PMH must be turned on. Instructions on how to turn on the OAI-PMH can be provided.
Contributor provided via ITHAKA FTP
The contributor may provide their metadata and/or media via ITHAKA’s FTP server. We'll provide details for using the FTP server.
Requirements:
- Metadata must include a field that corresponds to the filenames of the media.
- There must be a one-to-one match between metadata records and media.
- Please do not use spaces or tabs in a filename, as this can cause errors during transfer.
- Your filenames should only include the following characters:
- a-z
- A-Z
- 0-9
- _ (underscore)
- - (dash)
- . (period)
- Compound objects must be created using the same filename with sequence numbers (in the pattern _1, _01, or _001) appended. Example: filename_001.jpg, filename_002,jpg, filename_003.jpg
Post-publishing
For harvested content that is being shared to JSTOR, once the content is published we'll notify the participant so that they may immediately review how the content appears (such as metadata mapping).
This is important and we encourage participants to do so because our harvesting process is automated, as described above, and JSTOR does not review the output of that publishing process.