Reindexing Project

This page will list the possible changes we might like to see if we do a reindexing project.

Please email Lloyd with any additional ideas for reindexing to add to this page.

Indexing Changes

1) Remove 074 from ISN index

  • It is very common for many government documents to have the same 074, so this is not a unique identifier. This results in many unhelpful hits in the Headings Report duplicate checker. Removing the 074 from the ISN index would unclutter Headings Reports making it easier to use.
  • This should still be indexed somewhere.  Maybe with 086 or in Keyword index or in it's own index.  Our gov doc number index is an item level index, so this can't be combined.  There is nothing at the bib level that already indexes gov docs.

2) Move 019 from ISN index to BIB UTIL index

  • The 019 is a field for old OCLC numbers. Currently when we download a new version of an OCLC record with a new 001 field, it will not recognize that there is an old version of the record with the same number in the 019. This both creates a duplicate, and leaves an outdated record in the system. Putting the 019 in the BIB UTIL index would prevent that and make the new record overlay the old one.
  • Currently the 019 is indexed in such a way that only the first $a gets into the index. The 019 often has many more $a. They should all be put into the index.
  • Putting the 019 into the BIB UTIL index will also make duplicates between the 001 and 019 appear in Headings Reports.
  • A danger of this is that it becomes possible to overlay a new version of a record with an old one. That would only happen if someone loaded an old version of a record, perhaps supplied by a vendor. A loader for vendors that supply records not directly from OCLC could be set to ignore matches on the 019, that would prevent the most likely source of this problem.

3) Move 020$z into the keyword index instead of ISN index.

  • Currently we have a problem that records have ISBNs for many different versions of things, paper, audio, ebook etc. This means that it is possible for things to overlay on ISBN of the wrong format. People want to keep all the ISBNs to improve searching despite the risk of overlay errors. However, ISBNs for different formats are commonly put into $z of the 020 field. What if we put 020$z into the keyword index instead of the ISBN index? That way it won’t be used for overlaying, but Pika doesn’t care about Sierra indexes. Pika could be set to treat it like any other ISBN for searching in the public catalog. It would only limit ISBN search functioning in Classic catalog and Sierra searching. All the ISBNs could still be searched as keywords there too.
  • Pika was previously set up specifically not to index the 020$z because it created problems. We need to decide which set of problems are worse.
  • Another option would be a second ISBN index.  ISBN-1 would be for only the correct ISBN for the item described, and ISBN-2 would also include all the 020$z that have bad ISBNs and ISBNs for different formats.  ISBN-1 would be a match point, while ISBN-2 would be searchable in the catalog.

4) Reindex patron data

  • Telephone and email address

5) Remove $0 and $1 from all indexes

  • The $0 was added into most of our records by the Marcive authority record process.  While we hope it will someday be useful for linked data, at this time it creates problems in the classic catalog. 
  • For one thing, they appear on hit lists, where they are confusing and ugly. 
  • Also, if one record has the URI and another, with the same heading does not, then the index creates two separate hits on the subject browse.
  • We don't have any $1 data yet, but we anticipate we they will appear at some point.

6) Do we need any of these indexes:

  • Gov doc number (g) separate from local call number (m)?
  • UPC (j) separate from ISN (i)?
  • TitleKey (k)?
  • Author/Title (q)?
  • CARL BID (v)?
  • Course ID (x)?
  • Other (y)?

7) Are there item record fields we would like to index differently?

  • Maybe if we move the 590 into item records?
  • Maybe 583?

 

Field Group Tag Changes

1) Create separate Field Group Tap for 995 field

  • This is the field were we are noting the load profile used to load a record and the month and year it was loaded. Putting this in an index would make this information more useful.

2) Change the label on the y FGT to 'misc' instead of 'marc'.

3) Field Group Tag for 590 fields that would allow us to sort by them in Global Update.

4) Eliminate CARL INFO field group tag

  • We don't know what these are for.  We assume they were useful for the transition from CARL to Innovative, but no longer serve any purpose.

5) Eliminate TOC DATA field group tag

  • I think this in intended to handle the specialized authorized table of contents data fields you used to be able to get from Blackwell's.  I don't believe we have any of this data in our system, and I don't know if any vendors still even provide it.

6) 003 field group tag

  • This would allow grouping in Global Update by the source of the Bib Util number.

 

Last updated on 11/14/2017