Friday 26 July 2013

Match two marks in one line or two successive lines by grep or awk

Sometime we may need to check if a text file contains the wanted information which can be in one line or two sequential lines, but not others. For example we may check if a file contains the "operation system" and "Linux". It limits to the following format:

In one line:
        "operation system Linux" or "operation systems such as Unix, Linux, Windows..."
or
        "operation system
        Linux"
or
         "operation systems such
          as Unix, Linux, Windows... "
but not
         "Linux operation system"
not
         "Linux
          operation system"
not
         "operation system
         ......
         Linux"

pcregrep
pcregrep -M 'operation system.*(\n|.).*Linux' test.txt
awk
awk '/operation system/{k=1; ln=$0}  /Linux/{if(k == 1){print "--------"; print ;} if(k == 2){print "---------"; print ln; print;}} {if(k > 0){k++}}' test.txt
Problem: this awk will match the "Linux operation system"? How to avoid it?

Improvement
./match.awk test.txt
match.awk
#!/usr/bin/awk -f

BEGIN \
{
        print " - BEGIN - "
        m1 = 0; # mark1
        m2 = 0; # mark2
        ln = ""; # last line
}
{
        # two marks in one line
        if ($0 ~ /operation system.*Linux/) {
                        print "----1----";
                        print $0;
        }
        else {
                if ($0 ~ /operation system/) {
                        m1 = 1;
                        ln = $0;
                }
                else if ($0 ~ /Linux/ && m1 == 1) {
                # two marks in adjacent lines
                        m2 = 1;
                        print "----2----";
                        print ln;
                        print $0;
                }
                else {
                # clear all variables for unmatched line
                        m1 = 0;
                        m2 = 0;
                        ln = "";
                }
        }
}
END { print " - DONE - " }
Test Output
 - BEGIN -
----1----
operation system Linux
----2----
operation system
Linux
----1----
operation system such as Linux
----2----
operation system
such as Linux
 - DONE -
Test Input: test.txt
operation system Linux

operation system
Linux

operation system such as Linux

operation system
such as Linux

Linux
operation system

Linux operation system

operation system

Linux

operation system
such as
Linux

Monday 22 July 2013

[English Writing] commas (,) usage

Basic Rules

For more, please refer to grammarbook.com

R1. Dependent clause + Main clause

When starting a sentence with a weak clause, use a comma after it. Conversely, do not use a comma when the sentence starts with a strong clause followed by a weak clause. (from grammarbook.com)
Examples:
  • If I ever complete my PhD successfully, I will be amazed.
    • I will be amazed if I ever complete my PhD successfully.
  • When I use it, the e-mail always breaks down.
    • The e-mail always breaks down when I use it.

R2. Adverbs that relate to the whole sentences

Examples:
  • However, some researches adopt a markedly different approach.
  • Superisingly, he did not go to the school yesterday.

R3. Speech + comment

Examples:
  • "Paul Klee," he remarked, "was the first person to acutally paint commas".

R4. When a sentence is 'broken open' in the middle

Examples:
  • Some researches, on the other hand, adopt a different approach.
  • There are, it seems, two different solutions to the enthanasia problem.

R5. Items in lists

Use commas to separate words and word groups with a series of three or more.
Examples:
  • Many essays are over-long, convoluted and boring. [UK]
  • Many essays are over-long, convoluted, and boring. [US]
  • Do not steal, copy, or plagiarise ideas.

R6. Separate two adjectives when end can be inserted between them

Examples:
  • He is a strong, healthy man.
    • He is a strong and healthy man.
  • He is a lonely, young boy.

R7. Separate day and month from the year when writing date.

Examples:
  • I will have a business travel from May 15th, 2011.
  • I will have a business travel from May 2011. (If day, month or year is ommitted, no comma)

[English Writing] hyphen and dashes

Three types of dashes and its corresponding Latex within parentheses

  • Hyphen - (-)
  • en dash – (--)
  • em dash — (---)

Hyphen - (-)

A short horizontal mark of punctuation ( - ) used between the parts of a compound word or name or between the syllables of a word when divided at the end of a line. (from About)
Examples:
  • concatenation of compount words, e.g. eight-year-old
  • on-site support
  • follow-up

en dash – (--)

  • An en dash, roughly the width of an n, is a little longer than a hyphen. It is used for periods of time when you might otherwise use to. (from grammarbook)
  • An en dash is also used in place of a hyphen when combining open compounds.
    • attaches a prefix or suffix to an unhyphenated compound
Examples:
  • section 3–5
  • post–World War I treaty
  • New York–based writer

em dash — (---)

A mark of punctuation (—), technically known as an em dash, used to set off a word or phrase after an independent clause or to set off words, phrases, or clauses that interrupt a sentence. (from About)
"A dash is a mark of separation stronger than a comma, less formal than a colon, and more relaxed than parentheses." (William Strunk, Jr, and E.B. White, The Elements of Style)

Set Off Words or Phrases After an Independent Clause

"Life, said Samuel Butler, is like giving a concert on the violin while learning to play the instrument—that, friends, is real wisdom." (Saul Bellow, "My Paris," 1983)
"By trying we can easily learn to endure adversity—another man's, I mean." (Mark Twain)

Dashes Used to Set Off Words or Phrases That Interrupt a Sentence

Then, a review of model checking approaches—refinement based and temporal logic based—for currently applicable tools is taken, which provides the insight into the appropriate approaches for Circus and CML
"Copper Lincoln cents—pale zinc-coated steel for a year in the war—figure in my earliest impressions of money." (John Updike, "A Sense of Change." The New Yorker, Apr. 26, 1999)

Friday 19 July 2013

Reference numbering in Latex

Problems

Multiple references may cause a very weird problem.
For example, some time this VDM [2][1] is gotten from the LaTeX \cite{vdm2001} \cite{vdm1999}.
The order in reference is not correct. The right order will not depend on which citation is placed first and it should depend on which reference is referred first.

Solution 1

Use the cite package
\usepackage{cite} \cite{vdm2001, vdm1999}
This will result in VDM [1,2]

Solution 2

Use the natbib package with options
\usepackage[sort&compress]{natbib}
  • \citet for textual citations
  • \citep for parenthetical citations
For example
  • \citet{vdm2001}
  • \citep{vdm2001}
  • \citet[p.~20]{vdm2001}
  • \citet[chap.~2]{vdm2001}
  • \citet{vdm2001,vdm1999}

Thursday 11 July 2013

R: basics - installastion, environment and debug

Install R

  • $sudo apt-get install r-base

R edit environment

R basics

  • Install extra package
    • > install.packages("ismev")
  • Source R script
    • > source("fit-gumbel.r")
  • Call function provided by fit-gumbel.r
    • > fit-gumbel("in.dat")

R debug

take optim function as an example

  • display source code of optim
    • > edit(optim)
  • set debug mode
    • > debug(optim)
  • check debug mode
    • > isdebugged(optim)
  • cancel debug mode
    • > undebug(optim)
  • debug optim (R source code src/library/stats/R/optim.R)
    • > debug(optim)
    • > out <- gumbel(dat)
    • Browser[2]> where
    • Browser[2]> [RET] - next step
    • Browser[2]> c - continue
    • Browser[2]> n
    • Browser[2]> Q
    • Browser[2]>

tmux usage

  • tmux quick set
    • tmux to take out the tmux terminal environment
    • C-b % to split window into two panes (vertically)
    • C-b :split-window to split window into two panes (horizontally)
    • C-b o switch to next pane
    • C-b x kill current pane
  • C-b has change to C-a
    • download a tmux.conf file
    • tmux source-file /devt/dev-rye/tmux-conf/tmux.conf
    • C-a | split vertically
    • C-a - split horizontally