Perl Modules For Reference Processing
The OpCit
project is to develope software for Reference Linking for
Open Archives. The main tasks
involved in reference processing and linking are:
- Extract references from the documents. References are usually
listed at the end of a document, beginning with the word
References or REFERENCES or Bibliography, etc.
However, there are exceptions, e.g. the reference section may start
after a horizontal line, or appear as foot note. -
Extract Reference
- For each reference, extract metadata (author name(s), journal title,
volume , etc) from it. -
Citation Parser
- Use metadata to query the citation database in order to search for the on-line copies of the referenced documents.
Presently we are looking at documents from the Physics Archives of the
arXiv.org. References in these documents
are often terse (e.g. contains no article title), and some
need special process (see arXiv Specific).
Documentation
Source Code
(Z. Jiao, 03/08/01)