summaryrefslogtreecommitdiffhomepage
path: root/pcre.html.markdown
diff options
context:
space:
mode:
authorSachin Divekar <ssd532@gmail.com>2016-06-26 18:08:05 +0530
committerven <vendethiel@hotmail.fr>2016-06-26 14:38:05 +0200
commitd1216a4253c1b03641c10b171030d04227ad8408 (patch)
treef131979ad24915625010a8fad66491bbe1130450 /pcre.html.markdown
parentd4aa031d55e03a08f92fd2e3c407655eaf3ef4f2 (diff)
Add an example of trap command (#1826)
* Begin writing document for PCRE Started writing learnxinyminutes document for PCRE to cover general purpose regular expressions. Added introduction and a couple of details. * Change introductory example for regex The old example was incorrect. It's replaced with a simple one. * Add some more introductory text * Add first example * Added more example and a table for proper formatting * Add few more examples * Formatting * Improve example * Edit description of character classes * Add a way to test regex Add https://regex101.com/ web application to test the regex provided in example. * Add example of trap command trap is a very important command to intercept a fatal signal, perform cleanup, and then exit gracefully. It needs an entry in this document. Here a simple and most common example of using trap command i.e. cleanup upon receiving signal is added. * Revert "Add example of trap command" * Add an example of trap command `trap` is a very important command to intercept a fatal signal, perform cleanup, and then exit gracefully. It needs an entry in this document. Here a simple and most common example of using `trap` command i.e. cleanup upon receiving signal is added.
Diffstat (limited to 'pcre.html.markdown')
-rw-r--r--pcre.html.markdown82
1 files changed, 82 insertions, 0 deletions
diff --git a/pcre.html.markdown b/pcre.html.markdown
new file mode 100644
index 00000000..0b61653d
--- /dev/null
+++ b/pcre.html.markdown
@@ -0,0 +1,82 @@
+---
+language: PCRE
+filename: pcre.txt
+contributors:
+ - ["Sachin Divekar", "http://github.com/ssd532"]
+
+---
+
+A regular expression (regex or regexp for short) is a special text string for describing a search pattern. e.g. to extract domain name from a string we can say `/^[a-z]+:/` and it will match `http:` from `http://github.com/`.
+
+PCRE (Perl Compatible Regular Expressions) is a C library implementing regex. It was written in 1997 when Perl was the de-facto choice for complex text processing tasks. The syntax for patterns used in PCRE closely resembles Perl. PCRE syntax is being used in many big projects including PHP, Apache, R to name a few.
+
+
+There are two different sets of metacharacters:
+* Those that are recognized anywhere in the pattern except within square brackets
+```
+ \ general escape character with several uses
+ ^ assert start of string (or line, in multiline mode)
+ $ assert end of string (or line, in multiline mode)
+ . match any character except newline (by default)
+ [ start character class definition
+ | start of alternative branch
+ ( start subpattern
+ ) end subpattern
+ ? extends the meaning of (
+ also 0 or 1 quantifier
+ also quantifier minimizer
+ * 0 or more quantifier
+ + 1 or more quantifier
+ also "possessive quantifier"
+ { start min/max quantifier
+```
+
+* Those that are recognized within square brackets. Outside square brackets. They are also called as character classes.
+
+```
+
+ \ general escape character
+ ^ negate the class, but only if the first character
+ - indicates character range
+ [ POSIX character class (only if followed by POSIX syntax)
+ ] terminates the character class
+
+```
+
+PCRE provides some generic character types, also called as character classes.
+```
+ \d any decimal digit
+ \D any character that is not a decimal digit
+ \h any horizontal white space character
+ \H any character that is not a horizontal white space character
+ \s any white space character
+ \S any character that is not a white space character
+ \v any vertical white space character
+ \V any character that is not a vertical white space character
+ \w any "word" character
+ \W any "non-word" character
+```
+
+## Examples
+
+We will test our examples on following string `66.249.64.13 - - [18/Sep/2004:11:07:48 +1000] "GET /robots.txt HTTP/1.0" 200 468 "-" "Googlebot/2.1"`. It is a standard Apache access log.
+
+| Regex | Result | Comment |
+| :---- | :-------------- | :------ |
+| GET | GET | GET matches the characters GET literally (case sensitive) |
+| \d+.\d+.\d+.\d+ | 66.249.64.13 | `\d+` match a digit [0-9] one or more times defined by `+` quantifier, `\.` matches `.` literally |
+| (\d+\.){3}\d+ | 66.249.64.13 | `(\d+\.){3}` is trying to match group (`\d+\.`) exactly three times. |
+| \[.+\] | [18/Sep/2004:11:07:48 +1000] | `.+` matches any character (except newline), `.` is any character |
+| ^\S+ | 66.249.64.13 | `^` means start of the line, `\S+` matches any number of non-space characters |
+| \+[0-9]+ | +1000 | `\+` matches the character `+` literally. `[0-9]` character class means single number. Same can be achieved using `\+\d+` |
+
+All these examples can be tried at https://regex101.com/
+
+1. Copy the example string in `TEST STRING` section
+2. Copy regex code in `Regular Expression` section
+3. The web application will show the matching result
+
+
+## Further Reading
+
+