regex question: can we use part of a match as the replacement?

In the Import Filter editor, in the tab "Replace or Remove", one can use regular expression (regex) to find a match, one can also replace the match with some other text. My question is, can we use one part of the match as the replacement?

e.g. I have a field that have repeated terms, two examples follows:

physical review physical review
physical review letters physical review letters

I want to get rid of the repeated part.

We can match the whole term as RE((.*) \1)RE, here \1 matches the part that is repeated. Question is can I put this matched part \1 as the replacement, this way, we get rid of the repeatance of the field.

Thanks

Regex replace

It is possible to achieve what you are asking for from within regular expressions themselves by using the 'replace' function.

Looking quickly at the TRegexp help, 'replace' seems to be implemented by 'Substitute'.

I have not checked within the Biblioscape import filters to see if/how that functionality of regular expressions works there, but it would certainly seem worth you exploring.

Ian

Thanks, Ian, "Exploring" is

Thanks, Ian,

"Exploring" is certainly "worth" doing, and it is exactly what I have just done, and it did lead me to the solution. (see my earlier reply to Paul). Apperciate your encouragement.

You cannot put the matched

You cannot put the matched pattern as the replacement. Biblioscape needs literal words as the replacement. The replacement part cannot be a pattern. In regular expression, can you just match the first part of a repetition? Then you can put empty string in the replacement part. So the matched pattern can be removed.

It can be done

Of course regular expression is very powerful, (as happen to lots of these kinds of tools, only thing needed is to dive in the manual, or, in now days, search on web).

Use regex feature of "lookahead", one can actually look ahead the string without match them.

For my problem, to get rid of the first part if it is repeated later, I can match the first part by the following regex construct (extremely simple if you know it):
(.*) (?=\1)
I can then replace this match with blank to delete it.

Paul, I think now I am ready to make the generic import filter, I will post it when it is done. Love this powerful Biblioscape product, love to work within this community too!

I am happy to hear that

I am happy to hear that there is a solution within regular expression. I only know RE superfacially. But I know it is so powerful that a lot of seemingly imposssile things can be done. Thanks and looking forward to see your generic filter. Paul

Thanks, Paul, for making it

Thanks, Paul,

for making it clear that it is not possible to put the matched part in replacement.

It is hard (maybe impossible) to only select the repitition part using regex, since you have to match the whole thing before you find out there are repitition. You always select the whole thing. Maybe I'll find some other work-around.