Friday 26 July 2013

Match two marks in one line or two successive lines by grep or awk

Sometime we may need to check if a text file contains the wanted information which can be in one line or two sequential lines, but not others. For example we may check if a file contains the "operation system" and "Linux". It limits to the following format:

In one line:
        "operation system Linux" or "operation systems such as Unix, Linux, Windows..."
or
        "operation system
        Linux"
or
         "operation systems such
          as Unix, Linux, Windows... "
but not
         "Linux operation system"
not
         "Linux
          operation system"
not
         "operation system
         ......
         Linux"

pcregrep
pcregrep -M 'operation system.*(\n|.).*Linux' test.txt
awk
awk '/operation system/{k=1; ln=$0}  /Linux/{if(k == 1){print "--------"; print ;} if(k == 2){print "---------"; print ln; print;}} {if(k > 0){k++}}' test.txt
Problem: this awk will match the "Linux operation system"? How to avoid it?

Improvement
./match.awk test.txt
match.awk
#!/usr/bin/awk -f

BEGIN \
{
        print " - BEGIN - "
        m1 = 0; # mark1
        m2 = 0; # mark2
        ln = ""; # last line
}
{
        # two marks in one line
        if ($0 ~ /operation system.*Linux/) {
                        print "----1----";
                        print $0;
        }
        else {
                if ($0 ~ /operation system/) {
                        m1 = 1;
                        ln = $0;
                }
                else if ($0 ~ /Linux/ && m1 == 1) {
                # two marks in adjacent lines
                        m2 = 1;
                        print "----2----";
                        print ln;
                        print $0;
                }
                else {
                # clear all variables for unmatched line
                        m1 = 0;
                        m2 = 0;
                        ln = "";
                }
        }
}
END { print " - DONE - " }
Test Output
 - BEGIN -
----1----
operation system Linux
----2----
operation system
Linux
----1----
operation system such as Linux
----2----
operation system
such as Linux
 - DONE -
Test Input: test.txt
operation system Linux

operation system
Linux

operation system such as Linux

operation system
such as Linux

Linux
operation system

Linux operation system

operation system

Linux

operation system
such as
Linux

No comments :

Post a Comment