Introduction.
The reader will have obtained some awareness of the filter dialogue windows when building the filter in ABC Import Filters 4. The object of compiling this filter is to further explore the Import Filters dialogue windows and their functionality. Methods causing the filter definitions to parse in different ways will be explored and explanations offered, understanding these differences is necessary if regular expressions are to be effectively used. To assure that understanding a step by step approach will continue to be used for the moment and additionally the filter being compiled will be tested at various stages during its development, creating a view of the outcomes emanating from filter actions and the causes for some common errors. To minimise complexity and maximise the potential for understanding this filter will contain one sub filter only.
The web page used is structured, so will not be difficult to work with, even though complex filters will be used. Further insights of the Biblioscape reference database field structure/names and their relationship with the import filters will also be gained.
Preparation.
Before beginning, to simplify later work, set the Views menu so the Preview Pane Header (View|Preview Pane|Preview Header) is visible and the Live Preview tab is selected. The Preview Pane may be resized in the normal way and in the Formatted tab the preview area may be resized by dragging the separation bar which appears within the Formated preview window.
It may be helpful to create a reference field names crib sheet containing the extract of the Options|Reference Types tab illustrated for use in this example on the next page.
Remember the file mentioned previously, C:\Program Files\Biblioscape 7\Global\webAddr.txt which contains details of sites visited in the BiblioBrowser, well one just knew that would come in useful, and it can do where a number of URL’s to be referenced exist in a list. Open the webAddr.txt file in notepad and paste the following two links at the top of the list then save the file and close it.
http://machaut.uchicago.edu/?resource=Roget%27s
http://machaut.uchicago.edu/?resource=Webster%27s
Open BiblioBrowser and from the browser URL drop down list select the Roget’s link which is now listed opening that web site, also copy that URL from the browser address bar to the clipboard.
Create a Biblioscape Internet Resource for:-
Title - Roget’s Thesaurus 1911
URL - Paste the URL from the clipboard
Subject - Etymology
Access - Free
Now from the BiblioBrowser URL drop down list now select the Webster’s link (second link down) to open that site and also then copy that URL.
Create a Biblioscape Internet Resource for:-
Title - Webster’s Revised Unabridged Dictionary
URL - Paste the URL from the clipboard
Subject - Etymology
Access - Free
The Roget’s site will be used later for the next slightly more complex filter. For the moment with the Webster’s web page open ensuring that both editions are selected search for “Privacy”.
Building the Import Filter.
Select the “Capture Page” button and the “As a Reference” selection. When the save dialogue opens save as a “Web Page, HTML only” into the C:\Documents and Settings\User Name\My Documents\Biblioscape Tutorial\attachments\References\HTML folder.
Open the Web page stored on disk in the default browser by selecting the View Attached File icon (paper clip) visible on the right of the reference header bar in the preview window of the new reference.
Re-ordering the creation process from now on, take a few minutes at this stage to consider the newly captured reference and determine:-
· what type of reference it will become;
· which database fields the data contained in the reference should populate;
· which of the data initially appear useful as tags.
The reference type used could be influence by a number of factors. For simplicity during this example “Book edited” will be used for these particular references.
Creating a duplicate reference of the import and using cut and paste to populate the fields can be a simple way of determining some answers to the questions, although at this stage do leave the original reference unchanged. (The create duplicate icon available on the toolbar, which becomes visible when an individual reference is open, can be used, although the document field content is not copied by that process so would need to be cut and pasted across.)
New users may find it beneficial to obtain a printout of the imported reference rich text document so notes can be made. Using coloured marker pens to denote intended tags or fields can be useful. Otherwise make a rough list of the requirements.
The following tags at the beginning of lines appear to be available within the Webster’s privacy search record:-

Having completed the rough initial tag identification process return to Biblioscape and in the Rich Text tab of the unedited captured reference select all and copy to clipboard. (CTRL+A, CTRL+C.) Close the reference window.
Open the import filters dialogue window - File|Import Filters or CTRL+Shift+M.
Select the “New” button under the Main Import Filters (left hand) list box to create the new filter.
When the “Define Import Filter Type” dialogue opens paste the clipboard entry containing the document page into the “Examples” field at the bottom of the dialogue and then name the filter “Webster’s Revised Unabridged Dictionary”.
In the “Based On” field, type “Tutorial”
Complete the “Provider” field.
Complete the “Last Update” field entering the current date. i.e. 20050425 (Year Month Day)
Choose the new filter as a favourite immediately by selecting the greyed out box colouring it white and retaining the tick.
In the “Record” tab select “First Tag” and in the text box enter “Displaying ” (There is a trailing space.)
In the “Replace and Remove” tab type “a” and select the + button. From the “Available “Field”” drop down list select “Journal/Secondary Title” and then replace the “a” by selecting the <- button. In the “Replace with” text box type “^Webster's Revised Unabridged Dictionary”. Select the tick button.
In the “Replace and Remove” tab type URL and add it as an additional entry to the “Changes List” by selecting the + button. Then in “Replace with” textbox enter http://machaut.uchicago.edu/?resource=Webster%27s and select the tick button.
Whilst in the “Replace and Remove” tab at this time, review the list of “Available “Fields”” in the drop down list. Notice the field types listed consist of the default reference field names and that it is possible to type single characters to scroll the list quickly.
The other two tabs in the “Define Import Filter Dialogue” window “Date and Others” and “Authors and Keywords” will have the default option “Smart parsing” selected those should be left as they are.
Select OK to close the “Define Import Filter Dialogue” window.
With “Webster’s Revised Unabridged Dictionary” selected in the Main Import Filters List, in the “Sub-Filters List” select “Book Section” from the “Reference Type” drop down list which appears when the field is selected and then make “Book Section” the “Default Reference Type” for this filter.
With “Webster’s Revised Unabridged Dictionary” selected in the Main Import Filters List and “Book Section” selected in the “Sub-Filters List” select the sub-filters edit button.
Notice that because the sub-filter reference type is “Book Section” that the field title names in the “Map Tag fields to Data fields” area of the sub-filter window now reflect the “Book Section” reference title names, which remain listed in the order they appear within the database. Refer to the crib sheet to confirm.
In the sub-filters “Document” field area of the “Map Tag fields to Data fields” column type “Displaying ” (with a trailing space). Remember it is possible and preferable to cut and paste from the example to reduce the possibilities of errors.
Notice that when typing text into those fields, spaces are immediately indicated by a “·”. However if the text is cut and pasted into the field spaces appear as normal until the sub-filter dialogue window is closed and re-opened.
Look at the crib sheet to see which generic field name relates to the “Book Title” reference field”. Recall the replace or remove entry made earlier to the “Journal/Secondary Title” database field, achieving the necessary input. Knowledge of the database field names is important as without that knowledge the field names can seem to confusingly vary. Use the crib sheet to help as needed.
Note. It is possible to use required text as a tag identifier provided replace or remove is also used to replace that tag text within the database field and the text is consistent across all records on the web site in question.
With the “First Tag” being the initial record identification, good practice would seem to be to test the filter before moving on. Testing is possible at this point because the Replace entries cause other parsing to occur.
Select OK to exit all of the Import Filter dialogues and then CTRL+ALT+W to switch to the web browser module.
First associate the filter with the Internet Resource created earlier by right mouse click on the “Webster’s Revised Unabridged Dictionary” in the Internet Resources and select “Properties” from the drop down menu to open the “Bibliographic Source on the Web” dialogue window with the “Webster’s Revised Unabridged Dictionary” entry pre selected and ready to be edited.
From the drop down list in the “Import Filters” field select the “Webster’s Revised Unabridged Dictionary” filter. Note this drop down list allows character(s) to be typed thereby moving the list pointer to entries beginning with the character(s). So typing “We” in the drop down list box will quickly reveal the filter.
Now using the “Capture References” button and selecting the new import filter, import the two references contained on that web page.
CRTL+ALT+R to change back to the References module.
In the references view, select the first new reference and view it. Changing views to “User Defined” see the filter has worked with the “Book Title” and “URL” fields completed by the Search and Replace actions. The “Rich Text/Document” field contains the intended data, plus, because no other tag exists, all the other data from the web page up to the second occurrence of the tag.
If the filter does not work at this point return to it and check the “First Tag” entry duplicates the sub-filter “Document” “Map Tag fields to Data fields” field entry, use cut and paste if necessary to assure duplication. Also confirm that the correct Replace or Remove entries exist.
Notice that in the “Document” field other pieces of information appear which could be used to populate other database fields; Perfect!
Considering the 1913 and 1828 text portions of the “Rich Text Document” fields, that could be utilised in the database “Year” field. To do that some complex field tags would be required, and as complex field tags are not required to exist at the beginning of a line more flexibility becomes available. However any parsing actions must be conducted strictly in the order of appearance of the tags.
Identify text which could potentially be used as complex field tags.
Safe in the knowledge the initial record identification and parsing criteria are correct SHIFT+CTRL+M to re-open the Main Import Filters dialogue window.
With “Webster’s Revised Unabridged Dictionary” selected in the Main Import Filters List and “Book Section” selected in the “Sub-Filters List” re-open the “Sub-Filter Dialogue” window, select the “Document” entry and then the “Complex Fields” tab.
Notice that the complex fields “Available Fields” list reflect the “Book Section” reference title names but remain listed in the order they appear within the database. The field names can be selected by either typing one character after having selected one item within the list (typing the same character again will move to the next entry starting with that character), or using the scroll bar. Reference to the crib sheet or help file will assist with these lists until the reference field names and their order become familiar.
From the “Available Fields” list select “Year” and the Insert button then in the “ID text before selected field” enter “from the “ (There is a trailing space.) and in the “ID text after selected field” enter “ edition:” (There is a leading space.).
Now from the “Available Fields” list select “Title” and the Insert button. Ensure the “Year” entry appears above the “Title” entry in the “Parsing sequence” list, adjusting if necessary with the “Up” or “Down” buttons.
Notice that when “Title” is first inserted in the “Parse sequence” list the “ID text before selected field” and “ID text after selected field” fields may appear to contain the same text as the “Year” entry. This is merely a misleading display issue as changing to the Year entry and back to the Title entry would illustrate.
In the “Title” “ID text before selected field” field enter “ edition:“ (With a leading space) and in the “ID text after selected field” field enter “(Page: ”(With a trailing space).
Note in “Complex Fields”, text utilised in the “ID text after selected field” field remains available for further parsing use within the same set of complex fields.
Now select the “Start Page” entry from the available fields and add it to the parse sequence. In the “ID text before selected field” enter “(Page: ”. (With a trailing space) and in the ID text after selected field” field enter “)”.
Finally select the “Document” entry from the available fields and add it to the parse sequence. In the “ID text before selected field” enter “)”.
Remember cut and paste from the example box is possible but be cautious no carriage return, new line or paragraph mark is included.
Again OK out of the filters dialogues, CTRL+ALT+W and once more “Capture References”
CRTL+ALT+R and view the new references.
The title field for the 1913 reference should now contain “Privacy“ with some carriage return and new line characters. We will learn how to deal with those none printing characters within an import filter once we start using regular expressions. The title for the 1828 entry will be blank.
SHIFT+CTRL+M to open the Main Import Filters dialogue window.
With “Webster’s Revised Unabridged Dictionary” selected in the Main Import Filters List and “Book Section” selected in the “Sub-Filters List” re-open the “Sub-Filter Dialogue” window.
To now gain an understanding of how complex fields differ from other sub-filter fields; With the “Document” field selected in the “Map tag fields to data fields” list, from the complex fields “Available Fields” list select “Publisher” and then the “Insert” button, position the “Publisher” entry at the bottom of the parsing sequence list. Leaving both the “ID text before selected field and “ID text after selected field” blank.
Change back to the “Match Fields” tab and copy “Displaying ” from the “Document” field into the “Publisher” field so both fields contain that same tag. Notice that both fields now give access to the same set of complex fields criteria. If the Book Title field were selected and then the complex fields tab a different set of complex fields data would open.
Note - This particular method of duplicating sub-filter tags across database fields is only available for use within simple filters which have only one sub-filter. Filters with multiple sub-filters require a different approach.
Recall that in the previous filter, where no complex fields existed for a tag, that it was possible to import the same tagged data into more than one database field.
Note - It is not possible to duplicate the same data across multiple fields from one tag when complex fields are used.
Now OK out of the filters dialogues, CTRL+ALT+W and once more “Capture References”
CRTL+ALT+R and view the new references.
Note the new references may be easily identified by the “Read” icon, or the “Reference ID” number. Sorting the display by the “Reference ID” column in ascending order will help assure each new reference always appears at the top of the list. If that column is sorted in descending order the read icon will always appear on the unread reference at the bottom of the list. Remember this point when working with filters in this way as, if the criteria of multiple references within the index view is the same it is very easy to view the wrong reference whilst working with filters.
If the “Reference ID” column is not visible within the references index, select the View|Current View|Field Chooser menu item to open the customize box and drag the Ref ID column item to the required position on the reference index column list. Whilst the customize box is open any of the listed columns may also be dragged to re-order them as necessary.
The Publisher field will contain whatever data is left over (and hence discarded) after the complex field parsing within the “Displaying ” tag field. e.g. “1 result(s”
Complex field filters do not remove data from within the sub-filter tagged field unless the data appears within the “ID text before selected field” or is placed within a database field.
Data used in the “ID text after selected field” is not removed and remains available to any other complex field used within that tag although it is not available for use outside of that complex field group.
As has been demonstrated with this filter it is perfectly possible to construct a filter using one initial tag and use the complex fields to populate all the database fields. That method can make the filter compilation process unnecessarily complex, and some formatting can be lost as will be seen later, although it can work well on some simple sites.
Now notice that the 1828 reference has the text “The ARTFL Project 5720 South Woodlawn Chicago, Illinois 60637 The University of Chicago Department of Romance Languages and Literature” in the Rich Text Document field. That information could be useful in the Publisher field of both references but only appears once within the web page. Utilising the Replace or Remove functionality constructively to alter the document prior to the database parsing would enable that information to be used in all the references created so that will be done now.
SHIFT+CTRL+M to open the Main Import Filters dialogue window.
With “Webster’s Revised Unabridged Dictionary” selected in the Main Import Filters List (Recall the “Favourites” button can assist in bringing the filter into view) select the main filter edit button then select the Replace or Remove tab.
In the “Limit changes to tag or data field” type “Displaying “ (With a trailing space.) and select the plus button.
In the “Find what:” field enter “ result(” (With a leading space.) and in the “Replace with:” field “ result( --SA--The ARTFL Project,--PP--5720 South Woodlawn Chicago, Illinois 60637 --PB--The University of Chicago Department of Romance Languages and Literature --END--” (With a leading space.). Remember to select the tick button to complete the changes.
The --SA--, --PP-- and --PB-- are Biblioscape tag file format field identifiers which are used here for convenience only. The comma after prior to the --PP-- is to stop Biblioscape smart parsing the editors name.
Now change the parsing order in the Replace or Remove tab by selecting the Journal/Secondary Title entry and the + button then the URL entry and the + button. Having done that delete the original entries for those two items leaving the list in the order of:-
Displaying
Journal/Secondary Title
URL
Closing the main Import Filters dialogue window open the Sub Filter dialogue window.
Select the “Publisher” field and open the complex fields tab.
From the available fields list add Editors and City to the parsing sequence. Adjust the parsing sequence as necessary so that the items are in the following order:-
Editors
City
Publisher
Year
Title
Start Page
Document
Selecting the Editors entry and type --SA-- in the “ID text before selected field” and --PP-- in the “ID text after selected field”.
Selecting the City entry and type --PP-- in the “ID text before selected field” and --PB-- in the “ID text after selected field”.
Select the Publisher entry and type –PB-- in the “ID text before selected field” and --END-- in the “ID text after selected field”.
Test the filter again to ensure the publisher and city fields are populated. Clearly not all the information contained in the city field would ordinarily be required.
Any duplicate tag entry created in a simple sub-filter “Map Tag fields to Data fields” field is not so much for filter function but rather to support the user during filter creation and editing, although this is something which is not available in the more complicated filters as mentioned earlier and will be seen later, it can be useful.
Important Note. An issue to remember is that if a single tag field containing associated complex fields which do not appear elsewhere is to be deleted, it is very important that the associated complex fields are deleted first, otherwise the complex fields appear to remain within the sub-filter, although not visible. Access to any hidden complex fields of that type can be regained by re-creating the original “Map Tag fields to Data fields” tag.
Now to tidy up - SHIFT+CTRL+M to re-open the import filters dialogue window.
With “Webster’s Revised Unabridged Dictionary” selected in the Main Import Filters List select the “Edit” button from the Main Import Filters button selection.
Select the “Replace or Remove” tab and add “Document” to the bottom of that list. Then in the “Find what:” field enter “edition:” and select the tick button, ensure the “Replace with:” field is left blank.
So far the filters have worked without using any regular expressions. This last Replace or Remove will give a very brief introduction to the use of regular expressions by utilising one simple statement.
Regular expressions within the Biblioscape import filters are identified by RE(Content of Expression)RE.
The “RE(“ declaration starts the expression with the “)RE” declaration ending the expression. When the Biblioscape import filter encounters those identifiers it utilises the alternative regular expression engine for the statements contained within the regular expressions start and end declarations.
Select the “Replace or Remove” tab and make the following entry:-
Remember to use the tick button after the entry is made.
The meta character ^ has already been used in the context of indicating the beginning of a line, so it is helpful that it still means the same within a regular expression.
The regular expression meta character “\W” means a non-alphanumeric character.
The regular expression find and replace statement means - Find the start of a line where it is followed by two non-alphanumeric characters and replace them with nothing. This should remove the first line containing the “)” in the Document field in the 1913 reference and realign the 1828 document so the text starts at the beginning of that field.
Test the filter to see the difference, notice that the Document field in the references has changed.
Now Add “Title” to the Replace or Remove list, place the same Regular Expression statement in the “Find what:” field and test the filter again to ensure it works correctly now it is finished.
If required a simple find and replace could be used to remove the excess details which have been entered elsewhere in the record from the rich text field of the second reference.
Remember the importance of ordering the parsing sequence correctly.
A simple method of remembering a correct sequence of parsing actions for the Replace or Remove is:-
· Import document (Any Tag Field replace and remove must appear before any replace and remove parsing of the same target database field, with multiple actions being ordered sequentially from the beginning of the document).
· Biblioscape Database Field (Multiple actions within the same field are parsed in a sequence cognizant of the contents of the field following any previous parsing action).
If the reasons and importance for the parsing sequence in the Replace or Remove and differences between a tag field identifier and a database field identifier are not clear continue practicing Replace and Remove actions, trying out different things and testing the results to gain a clear comprehension. e.g. Change the order of the last two Replace or Remove actions to see how a simple change in the order of an action can affect the outcome.
The parsing sequence in the Complex Fields is equally important, even if more immediately obvious.