summaryrefslogtreecommitdiffhomepage
path: root/pcre.html.markdown
diff options
context:
space:
mode:
Diffstat (limited to 'pcre.html.markdown')
-rw-r--r--pcre.html.markdown47
1 files changed, 24 insertions, 23 deletions
diff --git a/pcre.html.markdown b/pcre.html.markdown
index 0b61653d..9e091721 100644
--- a/pcre.html.markdown
+++ b/pcre.html.markdown
@@ -3,16 +3,18 @@ language: PCRE
filename: pcre.txt
contributors:
- ["Sachin Divekar", "http://github.com/ssd532"]
-
+
---
-A regular expression (regex or regexp for short) is a special text string for describing a search pattern. e.g. to extract domain name from a string we can say `/^[a-z]+:/` and it will match `http:` from `http://github.com/`.
+A regular expression (regex or regexp for short) is a special text string for describing a search pattern. e.g. to extract domain name from a string we can say `/^[a-z]+:/` and it will match `http:` from `http://github.com/`.
PCRE (Perl Compatible Regular Expressions) is a C library implementing regex. It was written in 1997 when Perl was the de-facto choice for complex text processing tasks. The syntax for patterns used in PCRE closely resembles Perl. PCRE syntax is being used in many big projects including PHP, Apache, R to name a few.
There are two different sets of metacharacters:
+
* Those that are recognized anywhere in the pattern except within square brackets
+
```
\ general escape character with several uses
^ assert start of string (or line, in multiline mode)
@@ -32,18 +34,19 @@ There are two different sets of metacharacters:
```
* Those that are recognized within square brackets. Outside square brackets. They are also called as character classes.
-
+
```
-
+
\ general escape character
^ negate the class, but only if the first character
- indicates character range
[ POSIX character class (only if followed by POSIX syntax)
] terminates the character class
-
-```
-PCRE provides some generic character types, also called as character classes.
+```
+
+PCRE provides some generic character types, also called as character classes.
+
```
\d any decimal digit
\D any character that is not a decimal digit
@@ -59,24 +62,22 @@ PCRE provides some generic character types, also called as character classes.
## Examples
-We will test our examples on following string `66.249.64.13 - - [18/Sep/2004:11:07:48 +1000] "GET /robots.txt HTTP/1.0" 200 468 "-" "Googlebot/2.1"`. It is a standard Apache access log.
+We will test our examples on the following string:
-| Regex | Result | Comment |
-| :---- | :-------------- | :------ |
-| GET | GET | GET matches the characters GET literally (case sensitive) |
-| \d+.\d+.\d+.\d+ | 66.249.64.13 | `\d+` match a digit [0-9] one or more times defined by `+` quantifier, `\.` matches `.` literally |
-| (\d+\.){3}\d+ | 66.249.64.13 | `(\d+\.){3}` is trying to match group (`\d+\.`) exactly three times. |
-| \[.+\] | [18/Sep/2004:11:07:48 +1000] | `.+` matches any character (except newline), `.` is any character |
-| ^\S+ | 66.249.64.13 | `^` means start of the line, `\S+` matches any number of non-space characters |
-| \+[0-9]+ | +1000 | `\+` matches the character `+` literally. `[0-9]` character class means single number. Same can be achieved using `\+\d+` |
-
-All these examples can be tried at https://regex101.com/
+```
+66.249.64.13 - - [18/Sep/2004:11:07:48 +1000] "GET /robots.txt HTTP/1.0" 200 468 "-" "Googlebot/2.1"
+```
-1. Copy the example string in `TEST STRING` section
-2. Copy regex code in `Regular Expression` section
-3. The web application will show the matching result
+ It is a standard Apache access log.
+| Regex | Result | Comment |
+| :---- | :-------------- | :------ |
+| `GET` | GET | GET matches the characters GET literally (case sensitive) |
+| `\d+.\d+.\d+.\d+` | 66.249.64.13 | `\d+` match a digit [0-9] one or more times defined by `+` quantifier, `\.` matches `.` literally |
+| `(\d+\.){3}\d+` | 66.249.64.13 | `(\d+\.){3}` is trying to match group (`\d+\.`) exactly three times. |
+| `\[.+\]` | [18/Sep/2004:11:07:48 +1000] | `.+` matches any character (except newline), `.` is any character |
+| `^\S+` | 66.249.64.13 | `^` means start of the line, `\S+` matches any number of non-space characters |
+| `\+[0-9]+` | +1000 | `\+` matches the character `+` literally. `[0-9]` character class means single number. Same can be achieved using `\+\d+` |
## Further Reading
-
-
+[Regex101](https://regex101.com/) - Regular Expression tester and debugger