Introduction
A common question asked from time to time within the Biblioscape Yahoo group and Forum has been if it is possible to import ordinary documents into Biblioscape. The answer to that question is an unequivocal yes, but the how depends on the document format and layout, as well as the way the user may wish to populate the database. It is clearly possible to create a reference for a single document and place its text within the document field. As the reader will realise having worked through the exercises so far, it is also quite simple to create an import filter sufficient to deal with many formal documents formatted in a common way.
There is no real difficulty in facilitating the import of a single structured document containing material which could form multiple references into Biblioscape by simply converting the document into a formatted Biblioscape tagged file, provided some time and care is applied to the process.
This final task provides an example of one method to import a document in such a way by utilising a word processor. The intention here is to provide a working example of a method which could easily be used by new computer users to import a formally structured file into a Biblioscape database rather than produce a finished set of references. Because of the application constraints of many word processors the method is clearly not the best one, but it is easily achievable, and will provide most computer users with sufficient experience to progress further if or as necessary.
Clearly other methods of parsing which may be more appropriate in many circumstances could be used more effectively. The reader will need to identify their own skill levels and needs to choose the most appropriate method suited to their circumstances. Utilising a program containing regular expression functionality would serve to further develop the basic knowledge gained in this document. Another relatively simple method, reasonably easily learned, would be to use the macro functionality of a word processor or some programming language.
Whilst the references created by this example will contain errors, the exercise does serve as a good example of what can be feasible achieved by this method.
Having imported the references, the Biblioscape reference find and replace functionality will then be utilised using regular expressions to provide some experience in that area. A result of the choice of using the Biblioscape Lite version to compile this tutorial is that the Global Edit function, which provides a very simple method for database field editing and formatting, will not be illustrated here. Global Edit is available within the Standard and higher versions.
Microsoft Word is used for this example. There would however be no difficulty in using any other word processor which provides the necessary functionality, any search and replace application, or any large file handling application, many examples of which are available as freeware and some of which also include regular expression functionality. Microsoft Word itself is not ideal as it does not function particularly well with larger files. If the reader needs to deal with a particularly large file collection of data they will need to find a more suitable tool.
If the researcher is to be able to benefit from any data imported, inputting large files into an active Biblioscape research project should be done with greater care and thought than is illustrated by this example.
Biblioscape Tag File format
The Biblioscape help file contains details of the format of a Biblioscape Tag File. Basically each record within a file is separated by a single line containing only six hyphens. Each database field is then identified at the start of a line beginning the relevant field by a tag, as listed within the help file, preceded by two hyphens then followed by two hyphens and a space, followed by the field content.
e.g.
------
--AU-- Indicates author so the authors details would appear here
Preparation
First create a crib sheet of the Biblioscape database tags.
Open the Biblioscape help file, search for “Biblioscape Tag File” and open the item with that title. Copy the full item (With the cursor within the item CTRL+ALT+A then CTRL+ALT+C) and paste it into a word processor. Format it as necessary to reduce the contents to two sides of A4 paper. Selecting the table and formatting it as two columns then changing the font size to 10 will help. If not familiar or willing to spend time altering the page setup, the contents of the table are the items most needed.
In Bibliobrowser open the Project Guttenberg site at:-
Open the “Online Book Catalog” link and in the EText-No: search box enter the number 22.
Once the page for the search result titled “Roget’s Thesaurus by Peter Mark Roget” has opened capture the page as a reference in Web Page html only format.
Return to the Bibliobrowser, from the bottom of the web page download the Plain text zip file from one of the download links and save it to disk. This zip file may be required to be used a number of times.
Decompress the contents of the zipped file into the directory C:\Documents and Settings\User Name\My Documents\Biblioscape Tutorial\attachments\References folder and returning to the reference created a moment ago add the resulting Roget15a.txt file as an attachment.
With not knowing what purposes, if any, the material the focus of this filter may be used for, first some tidying in line with Project Guttenberg licence restrictions.
Open that Roget15a.txt file in Word or a word processor used.
Do not be concerned about making any alterations to the document, as if necessary a fresh copy can be unzipped to replace any document damaged beyond repair whilst practicing.
Search for the text “THESAURUS OF ENGLISH WORDS AND PHRASES“ and then delete all the text above that line to the beginning of the document.
Next search for the text “*** END OF THE PROJECT” and delete all the text from the end of the last entry above that line to the end of the document.
Now to adjust and tidy the format somewhat.
Turn off the spelling and grammar checks as you type options within the word processor so they do not affect the process.
Formatting the File using the Word search and replace function
Note: To avoid potential problems with Word do not immediately follow one search and replace with another. Save the Roget15a.txt file once each full search and replace action is completed.
Using the Search and Replace All option carry out the following replacements.

N.B. Comments appear within parenthesis and italics. They should not be input.
^p is the special paragraph character within Microsoft Word. (return & new line)
If Word starts to re-paginate wait for it to complete before continuing.
Save the file retaining the .txt format.
Search for ” QUANTITY BY COMPARISON WITH A STANDARD” and copy the two entries for “QUANTITY BY COMPARISON WITH A STANDARD” including the % on the line above the section title.
Open Notepad, paste the copied portion into it and save it as a text file named “Quantity.txt” C:\Documents and Settings\User Name\My Documents\Biblioscape Tutorial\attachments\References folder.
Closing the original file open the Quantity.txt in Word and after ensuring “Match Case” is selected in the Search options carry out the following find and replace actions in the order listed:-

Save the file.
Tidy up by resetting the spell checker and grammar options within Word.
Depending upon your hardware and software configurations it may be beneficial to reboot the computer at this stage.
Importing the Formatted File
Create a Reference folder in Biblioscape called “Roget’s”. (The “Add a folder” icon available in earlier versions is no longer available in the Import Bibliographies dialogue.)
From within the Biblioscape references module select the Menu item File|Import or CTRL+I to open the Import Bibliographies dialogue.
From the Import filter drop down list select the filter “Biblioscape Tag File” (It will appear after the capitalised B entries) then from the “To Folder” drop down list select “Roget’s” and browse to the edited text files, import the “Quantity” file and then the Roget15a file.
Some final tidying up within the references.
To restrict the following actions ensure the focus within Biblioscape is on the newly imported references and is on the index view, select the Edit|Replace item. CTRL+H.
From the “Limit find operation to field” drop down list select the “Document” entry, tick the regular expression selector and after making each entry from the list below select the “Replace All” button.
Make the following entries in the Find what textbox:-

Note – The regular expression declarations RE( and )RE are not required within the reference search and replace dialogue..
Remember that while the judicial use of iterators within regular expressions is of value, great care should be taken to ensure runaway expressions do not occur by following them with an effective terminator of some kind.
This demonstration of a rough import and cleansing is now complete so review the references imported. Any errors which exist are indicative of too little time taken in preparing the script for the search and replace operation rather than any fault with the method itself.
Clearly Biblioscape tagged file format imports initially require a file to be in text format. Most word processors allow documents to be saved as text and most other textual document formats have available converters. If in doubt about conversion search the internet. E.g. “pdf to text”
Whilst importing in this type of way can be very flexible, to import from any other bibliographic package users should use the existing import filters and methods as documented within the help files.
When thinking of importing large quantities of references consider the following extract from a thread entitled "Unstructured data" on the RECORDS-MANAGEMENT-UK@JISCMAIL.AC.UK list dated 2 August 2005
“even a cursory examination of any filing cabinet is almost certain to reveal that nearly half of the stored records no longer have much relevance to current needs.” (Linton, J.E. Organising the office memory: The Theory and Practice of Records Management, University of Technology, Sydney, Kuring-gai Campus, Centre for Information Studies Publications p113). In a finding by Kalthoff and Lee from a survey made by the Dartnell Institute of the USA back in 1978 it was stated that:
65c in every dollar expended in record keeping and filing is wasted
70% more records are retained than are needed
85% of filed references are never referenced
95% of all references are to documents that are less than 3 years old and
45% of filing space is used to store duplicates and records of doubtful reference value.
(Kalthoff, R J and Lee, LS; Productivity and automation. Englewood Cliffs, NJ: Prentice Hall, 1981 p116).
Whilst this seems to be a rather old text to be quoting, there are some organisations that appear to still have problems with its paper based record keeping systems. As we mentioned in a previous edition: a study conducted by PricewaterhouseCoopers found that professionals spend 5-15% of their time reading information, but up to 50% of their time looking for pertinent data.The average organisation also:
· Makes 19 copies of each document it receives or produces;
· Loses 1 out of every 20 documents;
· Spends 25 hours recreating each and every lost document;
· Spends 400 hours per year searching for lost files; and
· Spends $120 in labour searching for each misfiled document
IDM: Image and Data Manager; September/October 2003 P53”
This was extracted from the July edition of Information Overload - http://www.iea.com.au
There is an equal argument that much information is lost by neglect or wrongful application, which leaves researchers in the position of needing to know their data in sufficient detail at each of the appropriate level(s) in order to gain value and create information useful for their research purpose.