Merge pull request #1 from adambard/master

update
author: Suzane Sant Ana <tetestonaldo@gmail.com> 2017-12-31 14:27:06 -0200
committer: GitHub <noreply@github.com> 2017-12-31 14:27:06 -0200
commit: 42f9329bb3a028d374d6397991ac48b44064741e (patch)
tree: 1e75e2b3e122aeb863e3ffa037f6f64c4027fbf8 /pcre.html.markdown
parent: e6b77595f2669d66ac7be43c6e6083cbff80a9a7 (diff)
parent: 70a36c9bd970b928adde06afb2bd69f6ba8e5d5c (diff)
1 files changed, 82 insertions, 0 deletions
diff --git a/pcre.html.markdown b/pcre.html.markdown
new file mode 100644
index 00000000..0b61653d
--- /dev/null
+++ b/pcre.html.markdown
@@ -0,0 +1,82 @@
+---
+language: PCRE
+filename: pcre.txt
+contributors:
+    - ["Sachin Divekar", "http://github.com/ssd532"]
+    
+---
+
+A regular expression (regex or regexp for short) is a special text string for describing a search pattern. e.g. to extract domain name from a string we can say `/^[a-z]+:/` and it will match `http:` from `http://github.com/`.  
+
+PCRE (Perl Compatible Regular Expressions) is a C library implementing regex. It was written in 1997 when Perl was the de-facto choice for complex text processing tasks. The syntax for patterns used in PCRE closely resembles Perl. PCRE syntax is being used in many big projects including PHP, Apache, R to name a few.
+
+
+There are two different sets of metacharacters:
+* Those that are recognized anywhere in the pattern except within square brackets
+```
+  \      general escape character with several uses
+  ^      assert start of string (or line, in multiline mode)
+  $      assert end of string (or line, in multiline mode)
+  .      match any character except newline (by default)
+  [      start character class definition
+  |      start of alternative branch
+  (      start subpattern
+  )      end subpattern
+  ?      extends the meaning of (
+         also 0 or 1 quantifier
+         also quantifier minimizer
+  *      0 or more quantifier
+  +      1 or more quantifier
+         also "possessive quantifier"
+  {      start min/max quantifier
+```
+
+* Those that are recognized within square brackets. Outside square brackets. They are also called as character classes.
+ 
+```
+ 
+  \      general escape character
+  ^      negate the class, but only if the first character
+  -      indicates character range
+  [      POSIX character class (only if followed by POSIX syntax)
+  ]      terminates the character class
+  
+```  
+
+PCRE provides some generic character types, also called as character classes. 
+```
+  \d     any decimal digit
+  \D     any character that is not a decimal digit
+  \h     any horizontal white space character
+  \H     any character that is not a horizontal white space character
+  \s     any white space character
+  \S     any character that is not a white space character
+  \v     any vertical white space character
+  \V     any character that is not a vertical white space character
+  \w     any "word" character
+  \W     any "non-word" character
+```
+
+## Examples
+
+We will test our examples on following string `66.249.64.13 - - [18/Sep/2004:11:07:48 +1000] "GET /robots.txt HTTP/1.0" 200 468 "-" "Googlebot/2.1"`. It is a standard Apache access log.
+
+| Regex | Result          | Comment |
+| :---- | :-------------- | :------ |
+| GET   | GET | GET matches the characters GET literally (case sensitive) |
+| \d+.\d+.\d+.\d+ | 66.249.64.13 | `\d+` match a digit [0-9] one or more times defined by `+` quantifier, `\.` matches `.` literally |
+| (\d+\.){3}\d+ | 66.249.64.13 | `(\d+\.){3}` is trying to match group (`\d+\.`) exactly three times. |
+| \[.+\] | [18/Sep/2004:11:07:48 +1000] | `.+` matches any character (except newline), `.` is any character |
+| ^\S+ | 66.249.64.13 | `^` means start of the line, `\S+` matches any number of non-space characters |
+| \+[0-9]+ | +1000 | `\+` matches the character `+` literally. `[0-9]` character class means single number. Same can be achieved using `\+\d+` |
+
+All these examples can be tried at https://regex101.com/ 
+
+1. Copy the example string in `TEST STRING` section
+2. Copy regex code in `Regular Expression` section 
+3. The web application will show the matching result
+
+
+## Further Reading
+
+
author	Suzane Sant Ana <tetestonaldo@gmail.com>	2017-12-31 14:27:06 -0200
committer	GitHub <noreply@github.com>	2017-12-31 14:27:06 -0200
commit	42f9329bb3a028d374d6397991ac48b44064741e (patch)
tree	1e75e2b3e122aeb863e3ffa037f6f64c4027fbf8 /pcre.html.markdown
parent	e6b77595f2669d66ac7be43c6e6083cbff80a9a7 (diff)
parent	70a36c9bd970b928adde06afb2bd69f6ba8e5d5c (diff)