We assume two databases are available. The first one holds creation-related data (URN, OAMS metadata). (Note: for Open Archives, the database can be loaded up with existing metadata records.) The other database is the citation database consisting of source and target pairs of document ids. A document id could be the index in the first database of where metadata for the citing and cited creations can be found.
We now have the following private fields defined:
BibData myData // our document id String myURL // Network address of our item MIMEfile localMetaData // Original text fragments
ZJ: This look up process may not be simple if the collected metadata from the reference is not complete or contains errors. CiteSeer can 'predict' that the following three citations refer to the same paper (http://citeseer.nj.nec.com/check/814):
Quinlan J. R.: C4.5: Programs for Machine Learning,
Morgan Kaufmann, San Mateo, California 1993
Quinlan, J. R. (1993). C4.5: Programs for machine
learning. San Francisco,
CA: Morgan Kaufmann.
J. Ross Quinlan. C4.5: Programs for Machine Learning.
Morgan Kaufmann, San Mateo, CA, 1993.
This kind of 'grouping' process is necessary in
order to obtain the correct number of citations of
a particular paper (i.e. forward links).
Also only after this process, it is possible to "have
a chance to correct/add some more metadata to the database for this document."
If the reference was in the database already, then it has a URN. Construct a new Citation out of this reference by giving it the context[] and type of citation (REFERENCE). Use the reference's URN to locate the surrogates for copies of this creation. (This involves a call to a handle system.) For each surrogate on the list invoke its addCitation method, handing it the new Citation object.
ZJ: May be construct a new Citation out of current paper is what it really meant? From the point of view of this reference, the current paper is its citation, so we need to turn this paper to a Citation object. Effectively, each paper creates a Citation object as long as it has at least one reference. This step and the addCitation are also needed for references not already in the database (see below), I think.
If the reference is not already in the database, then we need to construct an OAMS metadata MIMEfile and add it to the database. Save the newly generated document id.
Finally, construct a BibData from the reference's document id and store it in referenceData.
For each reference, construct a new Citeref from this document's id and the reference id, and add it to the citation database. (ResearchIndex would also generate a unique CID for this citation.)
At this point, we have a completed Reference object:
BibData referenceData // pointer into the creation database; a doc id int ordinalNumber // which reference this is in this item String origRef // how the reference was spelled in the text String context[] // context strings from the text for this reference RefEnum refType // NATURAL, AMBIGUOUS, CLEAR, or LINKABLEProcess each reference in the same way until the Surrogate's refList[] is complete.
ZJ: Steps 8-9 do not seem necessary? (Please see reasons below).
If this citation is already in our currentCitations we are done with this CiteRef. (We know it's in our list by matching up document ids.)
(ZJ: currentCitations is defined as knownCitations in the Java API specification.)
If it is not on our list, then we must construct a new Citation object and add it to our current Citaions. Constructing a new Citation object requires a document id, a set of context strings, and a citation type. We have the document id. Use it to access the surrogate corresponding to the citing creation.
ZJ: For me, the only possible situation where a <source_id, target_id> pair exists in citation database (i.e. one of the two existing databases) but is not a CiteRef object is that: this <source_id, target_id> was created from information found in SCI (or possibly CiteSeer too?) and there are no 'surrogates' corresponding to the citing creations yet, right? If a <source_id, target_id> pair is also a CiteRef object, then the source document should have already been processed and turned into a Citation object which was added into the currentCitations list of the target document's surrogate in step 7 when creating surrogate for the source document. Therefore , performing addCitationin step 7 when dealing with each reference is important. From the referenced documents point of view, addCitation actually performs the jobs described in steps 8-9, i.e. building up a list of known citations for the surrogates of the referenced documents.
How do we do this access? First, we feed the citing creation's URN to our name server, which gives us URLs for all the surrogates for the creation that cites us. Pick one of the surrogates. Invoking its getRefID(MIMEfile citation-BibData) will return the complete Reference in the citing creation for which we were the target.
Turn that Reference into a Citation by invoking the static Surrogate.buildCitation( Reference ) method.
Constructors for Surrogate, Reference, Citation, BibData, Creation, CiteRef getID() getRefED() (ZJ: getRefID)Protected methods used:
addCitation() buildCitation()