|
The latest tutorials and blog posts from PHP Freaks. Sun, 21 Dec 2008 01:33:12 +0100 IntroductionPHPIDS (PHP-Intrusion Detection System) is a simple to use, well structured, fast and state-of-the-art security layer for your PHP based web application. The IDS neither strips, sanitizes nor filters any malicious input, it simply recognizes when an attacker tries to break your site and reacts in exactly the way you want it to. Based on a set of approved and heavily tested filter rules any attack is given a numerical impact rating which makes it easy to decide what kind of action should follow the hacking attempt. This could range from simple logging to sending out an emergency mail to the development team, displaying a warning message for the attacker or even ending the user's session In a nutshell PHPIDS is an advanced intrusion detection system written with performance on a large scale in mind. The basic installation and configuration is pretty straight forward. Requirements
InstallationFirst we need to download the latest stable release at http://php-ids.org/downloads/ and decompress it. Please note that you do not want public access to the phpids directory. I recommend that you place it above your document root. If you are on a shared host and cannot place it above the document root the following rewrite will prevent unwanted access. RewriteEngine On
RewriteCond %{REQUEST_URI} ^/phpids(.*)
RewriteRule ^(.+)$ - [F]ConfigurationThe basic configuration is extremely simple. By default it comes with several examples. I recommend that you take the time to look at the original Config.ini and browse through the included documentation. It will work "out of the box" with very few edits to the Config.ini. My Config.ini looks like: [General]
filter_type = xml
use_base_path = false
filter_path = default_filter.xml
tmp_path = tmp
scan_keys = false
HTML_Purifier_Path = IDS/vendors/htmlpurifier/HTMLPurifier.auto.php
HTML_Purifier_Cache = IDS/vendors/htmlpurifier/HTMLPurifier/DefinitionCache/Serializer
html[] = __wysiwyg
json[] = __jsondata
exceptions[] = __utmz
exceptions[] = __utmc
min_php_version = 5.1.2
[Logging]
path = tmp/phpids_log.txt
recipients[] = me@domain.com
subject = "PHPIDS detected an intrusion attempt!"
header = "From: <PHPIDS> noreply@domain.com"
envelope = ""
safemode = true
allowed_rate = 15
[Caching]
caching = file
expiration_time = 600
path = tmp/default_filter.cacheNow, we need to write a simple php script to enable PHPIDS. I used one of the included examples with minor modifications ids.php
Now we will use PHP's auto_prepend_file to prepend our ids.php script from above to all other php scripts. You can do this by adding the following to your php.ini auto_prepend_file /full/path/to/ids.php Or with a .htaccess we can do something like: php_value auto_prepend_file /full/path/to/ids.php Thu, 04 Dec 2008 22:00:57 +0100 Just a short notice. Our RSS feeds do now have entire blog posts and tutorials instead of descriptions, so you can now read the entire thing in your RSS reader if you wish. Feeds: - Daniel Thu, 04 Dec 2008 00:40:28 +0100 So you have some tabular data printed out in your browser. You can even change the order of the information by clicking on the column name at the top of your list. But can you make your own custom list order? Or maybe you're deciding to make a content management system (CMS). You're making your features modular, so it's easier to add/remove modules. How do you go about displaying them in a custom order in the browser? Or to illustrate with simple numbers: If you have a list ordered 1 2 3 4 5 and you're wanting to somehow make it 1 3 2 5 4 or 1 2 3 5 4 or 1 5 2 4 3 or whatever else, then you've come to the right place. Just want to say up front that yes, I am aware that there are several ajax or other "web 2.0" type methods, frameworks, etc.. that offer this sort of thing. You can drag and drop rows and it's flashy and no, that's not what you're going to get out of this tutorial. This is straight php. No bells and whistles and warm fuzzy little kitties to jump up and down and purr about what a great job you're doing. Setup the DatabaseSince data is *usually* stored in a database, we are going to be storing our custom list order in a column in a SQL table. I'm using MySQL just because that's what I use, but as far as I know, all queries should work with any of the SQL database. First things first, let's set up a table. Create a table named 'info' with a column named 'usort' type int and a column named 'name' type varchar(10). Here is a query for that:
And here is another query to populate it with some data:
Note: The values in the usort column need to be unique in order for this code to work. However, DO NOT make this column primary key, auto_increment, or unique. If you do this, the query that swaps usort values will yell and scream at you about having duplicate entries. But that's okay, because usort is not meant to replace the row id which you will probably have, and will probably already be unique, auto_incremented, etc... The PlanOkay now that we have our table setup with some data to work with, let's talk about the script. The goal of this example script is to display the data from the table. We will make the column names links so that we can sort by column name. More importantly, we will create 'up' and 'down' arrows for each row, so that we can order the list how we want to. When you click an up or down arrow, the script will send a query to the database to swap numbers in the table with the row above or below the row. Entire code that way ----------> The CodeHere is the whole script. The next few pages will break it down bit by bit.
Connect to db
First thing to do is connect to the database. Nothing fancy here. Insert your own info where appropriate. If an arrow was clicked...
Okay next, we want to check if an arrow was clicked. We do this by checking for two GET variables. One variable ('dir') will tell us which direction to swap the number (up or down), and the other variable ('id') will tell us what the number is. We assign them to 'regular' variables for easier coding. $id is cast as type int. The reason why we do this is to keep people from entering in half numbers or things other than numbers. We do this to prevent possible sql injection attacks from that variable (you always have to be security conscious). Up or Down
$dir tells us which way we want to swap: up or down. For example, if we have a list of 1,2,3, swapping up will change the list to 2,1,3. Swapping down will change the list to 1,3,2. List Swapping up Swapping down 1 2 1 2 1 3 3 3 2 We will use a switch to decide what to do if $dir is up or down. if $dir == 'up' then we will use a ternary operator to make sure that the current row is greater than 1, so that there is something above the row to swap with. If there is, then we will assign the previous row to $swap by subtracting 1. If there is no row above the current row (it's already at the top of the list), we assign 1 to it. How do we know that the row above it is 1 less than its current number? Because, all of the usort values are supposed to be unique. 5 rows == 5 numbers, 1-5. There will always be 1,2,3,4,5. Ordering by usort will always be 1-5 (or however many you have). You have to program it that way, or it won't work. Well, I take that back. It won't work for this code example. You can get fancy and do a query to find out what the one above or below will be, regardless of whether it's exactly 1 or not, but we aren't gonna get all complicated with it. Going Down on...the List
Swapping down a row is the same principle as swapping up row, except we're like, going down, instead of up. First we do a select count(*) to find out how many rows we have. Yes this will be done every page load. Yes we can save it in a cookie or session var or pass it through the url (though I'd say no to the GET method for this var anyways, for security reasons), but in an effort to simplify, we're just going to select every time. Grab the result, put it in a var $max. Use a ternary to check if there is a row to swap down to. If there is, then add 1 to the current row's id. If not, then assign it $max.
The default for $swap will be $id. This is for in case someone decides to enter into the url dir=somethingotherthanupordown. 2 Card Monte...
Now that we know which two numbers we're going to swap, we will run our query to swap them in the database. Basically the query says this: For every number in the IN(...) list, we are going to run a condition on it. If it equals one thing, we're going to assign it this other thing. If it's equal this other thing, we're going to assign it that one thing Or a more 'php' way of saying it: foreach ($list as $row) {
if ($row == $x) {
$row = $y;
} elseif ($row == $y) {
$row = $x;
}You: "Now wait just a minute there... how can the database do that? I understand doing update table set column = $x where column = $y, but you can't just turn around and do the same thing for the 2nd row, because the first row has already been updated! There's no number to search for! wtf??" Me: MAGIC. You heard me: MAGIC. For real. Okay for really real, what happens is, internally the database will create a temporary variable to perform the execution. First row gets updated where it equals whatever. 2nd row is updated based on temporary variable. However, between the time the first row gets updated and the 2nd row gets updated, both rows will contain the same info, hence the duplicate error message that will ensue if you try to index that column as some flavor of unique. Sorting the data
Pretty straight forward. We want to be able to click the column name to sort the data by that column, so assign a column name to $sortby to use in the query. Since we only have two columns to sort by, we use another ternary to assign one or the other. This ensures that $sortby will only be 'name' or 'usort' and not something else like another sql injection attempt. We then run the query to get the data to display. Display the info
Next we will start an html table, making the first row the column names. We make them links, passing the column name through the url, so the script knows which column to order the results by, should you click on one.
Finally, we use a while loop to loop through and display the results from the data pulling query. We make some up arrow and down arrow links, passing which direction to swap, as well as the usort number for that row, so the script knows what rows to swap, should you click one of those links. The name is displayed as plain text, because we aren't doing anything with that. Close the table after the loop, and we're done. The EndWell there you have it; a custom list order method made easy. May you find this of some use in your coding endeavors. Crayon Violent Wed, 03 Dec 2008 13:03:35 +0100 Due to high costs of keeping this site online it has been decided to put advertisements on the forums. If you wish to hide the ads, please consider donating instead of using ad blocking tools, as all advertisements will be hidden for people in the PHP Freaks Donator group. Update: You need to clear your cache to get the new stylesheet so the advertisements will be positioned correctly. Sun, 12 Oct 2008 22:36:28 +0200 Table of Contents
1. Regular Expressions Basics Regular Expressions Basics
What are Regular Expressions? How are they different from regular string searching functions?
That's where the power of regexes comes into play. They have amazing capabilities in terms of analyzing strings for certain patterns and matches. Why bother? I can make parsing routines that do the job just fine. Well, now it's time for you to learn how to create regular expressions of your very own. Note: This tutorial is just covering regular expression syntax. Actual pattern matching, substitution, and handling of results in PHP will be covered in subsequent tutorials. Also, this tutorial set covers Perl-Compatible Regular Expressions (PCRE), for reasons which will be discussed later. Creating Your Own PatternsBefore we start interpreting regular expression syntax, it would be a good idea to first see what a pattern looks like. A pattern must ALWAYS have an opening delimiter and an ending delimiter to "enclose" the pattern so the engine knows where to stop and for separating modifiers from the rest of the pattern, which will be discussed later on in the tutorial. The most commonly seen delimiter is /, but it's often advisable to use some obscure character that will never end up in one of your patterns (such as `, #, or !). For most of the examples in this tutorial, I'll be using / as my delimiter, simply because it's conventional, but you can use any character really. Now that we have delimiters out of the way, let's look at one of the most simple patterns. /abc/ When a plain letter is shown in a regular expression, it's interpreted as just that -- a plain letter. This pattern would match ANY string containing a 'abc' in that order. Not very useful, but it illustrates some basic principles. Notice how the part you want matched ('abc') is contained within the delimiters? That's how every pattern has to be formatted, otherwise it simply won't work. More Than Plain Letters \ | . ( ) [ ] { } ^ $ + ?In order to actually match these characters literally, you'd need to add a backslash before them. So, by that statement, in order to match a literal ., you could use the following pattern: /\./ Notice once again the delimiters (/ /), and the backslash which comes before the metacharacter to "de-meta" it. But a bit of an interesting point to note would be that in order to match a literal backslash, you'd need to put 2 backslashes, since the backslash itself is a metacharacter. /\\/ How Metacharacters WorkThis section will be completely devoted to an in-depth explanation of each of the metacharacters, and how they act when used in a pattern. The Catch-All Metacharacter (.) /c.t/ This pattern would match cat, cut, cbt, cet, czt, c4t, and so on, but not caat because there's only one dot (.) in the pattern. It's important to note that . only matches ONE character, until quantifiers are introduced. The Anchoring Metacharacters (^ $) /^Z/ The $ character is essentially the same as the ^ character, except it matches the end of a string. To make sure that a string ended with a g, you'd use: /g$/ Note how the $ came after the g, unlike in the pattern with ^. This is because the pattern is literally telling the regex engine "a g followed by the end of the string boundary", not the other way around. In fact, it would make little sense, because you can't have characters after the end of a string. The only way that characters can appear after a $ is if you have your regular expression multi-line mode. It will be discussed in depth later on in this tutorial, but keep in mind that the meaning of ^ and $ can change sometimes. These two metacharacters can also be used simultaneously in a pattern, and in fact, that's often how they are used. It would probably help to show an example: /^abc$/ This pattern, after being taken apart, is quite simple. It is saying that a string must begin, then match a 'abc', then match the end. Basically, the string must be 'abc' in order to match. The Grouping Metacharacters (( )) Any examples show right now would be rather pointless, but here's one: /(abc)/
Right now, this would match the same as /abc/, but that's going to change after the explanation of other metacharacters.
The Quantifying Metacharacters ( * + ? { } ) The * quantifier says that the preceding subpattern must appear 0 or more times, which basically means that it can appear, and if it does, it doesn't matter how many there are. It's often used to account for random whitespace in a string, but it has other uses as well. For now, let's look at a simple pattern: /ab*c/ This pattern would match ac, abc, abbc, abbbc, etc. If you want to have * operate on more than one character, you'd need to use those grouping metacharacters that were mentioned earlier (I told you they'd come in handy!): /a(bcd)*/ This would match a, abcd, abcdbcd, abcdbcdbcd, and so on. The + quantifier operates just like the *, except it dictates that the preceding subpattern must appear one or more times. It tells the Engine that a certain subpattern must appear, and if it does, it can repeat indefinitely. /c.+t/ Here, you can see some of the real power of quantifiers. They can be used with any character, including metacharacters like the dot, in this case. This pattern would match cat, caaaaat, cbbbajsduasuut, cjkallskt, etc. The ? quantifier makes the preceding subpattern optional, meaning the preceding group can appear zero or one times. /a(bcd)?e/ This would match either abcde or ae, because the ? makes the (bcd) subpattern optional. The { and } metacharacters are used to specify even more exact quantities for subpatterns. They have several different syntax options to accomplish different things, and they are as follows: /a(bcd){2}e/ #matches abcdbcde because {2} specifies EXACTLY 2 matches
/a(bcd){2,3}e/ # matches abcdbcde or abcdbcdbcde because {2,3} means 2 or 3 matches, inclusive
/a(bcd){2,}e/ # matches any string with a, 'bcd' repeated AT LEAST 2 times, and an e. {2,} represents a minimumBy the way, you can't just specify a maximum without a minimum (like {,2}). If you wanted a maximum, you could say {1, max}. Also, an interesting note is that all of the other quantifiers can somehow be represented in terms of { }: /a(bcd)*e/ is equal to /a(bcd){0,}e/
/a(bcd)+e/ is equal to /a(bcd){1,}e/
/a(bcd)?e/ is equal to /a(bcd){0,1}e/The *, +, and ? quantifiers are often preferred for readability though. It's important to note that without using the anchoring metacharacters, ^ and $, a pattern will bring back matches in a string even if there are other characters present. For example, in the following string: drtabcabcabcpdl The following pattern would indeed bring back a positive match: /ab*c/ If you wanted to ensure that a string contains ONLY a certain pattern, you'd need to anchor it: /^ab*c$/ Now the pattern would only match a, any number of b's, and a c. The Alternation Metacharacter (|) /(yes)|(no)/ That would match either 'yes' or 'no' in its entirety, but it could have different results if the grouping was left out. It's highly recommended to keep track of how you group things, since it can completely change how the Engine looks at a pattern. Just to help you visualize an example of where grouping in alternation is important to keep track of, I'll show you the following pattern: /prob|n|r|l|ate/ This pattern would actually match 'prob', 'n', 'r', 'l', or 'ate'. If you wanted to match probate, pronate, prorate, and prolate, you'd use: /pro(b|n|r|l)ate/ The Character Class Metacharacters ([ ]) /c[aeiou]t/ This regex would match cat, cet, cit, cot, and cut, because the [aeiou] class contains those characters in between the c and the t. If I changed the character class to [au], it would only match cat and cut. Now, it's not to say that you couldn't accomplish the same thing with the alternation metacharacter, but it becomes very unwieldy and most of the "cool" functionality of character classes (which will be covered right after this) can't be achieved with it. The previous pattern could have been written as: /c(a|e|i|o|u)t/ But who actually wants to type that? The cool part about character classes is ranges. Inside of a character class, you can specify ranges (separated by a -) to match. If you wanted to match a string containing any 5 digit number, you could write this pattern: /([0-9]{5})/Acceptable ranges are a-z, A-Z, 0-9, and some other ranges involving the actual "value" of certain characters, but that's a bit advanced. Another important thing to mention is that ranges can be "stacked" inside of a single class. The following example illustrates that. /^[a-zA-Z0-9_]+$/ That pattern would dictate that a string must contain any amount of only alphanumeric characters and the underscore (_), due to the anchors (^ and $), the quantifier (+), and the character class ([a-zA-Z0-9_]). There are also "shortcut" character classes which the Engine understands automatically. They are as follows: /\d/ #matches any digit /\D/ #matches any NON-DIGIT /\w/ #matches any word character (which includes the underscore and digits, so it's like [a-zA-Z0-9_]) /\W/ #matches any NON-WORD character /\s/ #matches any whitespace character like a literal space, a tab, and a newline /\S/ #matches any NON-WHITESPACE character
These shortcuts can be used both inside and outside of actual character classes, meaning they can appear anywhere in a pattern. /\d{5}/Learn the shortcuts, as they'll help you a lot when you're actually writing patterns of your own. An example of using a shortcut inside of a character class would be: /^[a-zA-Z\s]+$/ This pattern would match a string containing a-z, A-Z, and any space characters. There are also some other tiny nuances with character classes that you should really familiarize yourself with. If a character class starts with a ^, it no longer means the beginning of the string, but instead, it acts as ! (NOT) does in PHP. It negates the character class. /c[^au]t/ That would match cbt, c$t, c!t, crt, etc, but not cat and cut. Another interesting point to mention is that the . loses its metacharacter properties inside of a character class, meaning you can use it as a literal period inside of a class. Metacharacter ConclusionThat just about sums it up for the metacharacters (there's still some advanced syntax involving a few metacharacters, but it's nothing to worry about yet). Remember that all of these metacharacters can be used at once in a pattern, allowing you full control of how you match your string. Quantifier GreedinessI felt that this topic deserved a page of its own. When using quantifiers, there is a concept known as greediness and laziness which can cause a lot of confusion for newcomers to regular expressions. What is greediness? You have the string 'exasperate'. You run the following pattern on it: /e(.*)e/ Believe it or not, but that (.*) actually matches xasperat instead of xasp as you may have thought. The regular behavior for quantifiers is to gobble up as many characters as it possibly can, hence greediness. It wants to grab as many characters as it can possibly get away with and still match. That's why it goes right past the second e in exasperate and keeps on matching until it reaches the last possible e it can. Making the Match Lazy /e(.*?)e/ That tells the engine to take as much as it needs to succeed on the match, and nothing more. Another way that you might see greediness being countered is by using negative character classes, but that only works if there's only one character you want to prevent greediness from. For example, in an HTML-matching pattern, you wanted to get the text in a certain <p> tag, which just so happens to be followed by another <p>, like in this tiny snippet: <p id="test">p1</p> <p id="test">p2</p> You could write your pattern like this: !<p id="test">(.+)</p>! This would, not surprisingly, gobble up BOTH <p> tags, even though it's really mismatching the closing tag. The Engine doesn't realize that, and it likes being greedy, so it does. You could rewrite the pattern as: !<p id="test">(.+?)</p>! Or, you could use a negative character class and say: !<p id="test">([^<]+)</p>! The latter is often slightly quicker in terms of execution speed, but can be more difficult to understand. Another great use for negative character classes is when you're trying to get the information from a specific HTML tag's attribute. The following pattern would get an img tag's src attribute. !<img(.+?)src="([^"]+)"(.*?) />! Now, that's a really complicated pattern, but when you break it down, it's actually quite simple. First, you want to match the literal pattern '<img'. Then, you match any amount of characters (non-greedy), until you reach the src attribute. Then, you use a negative character class ([^"]+) in order to grab everything up to the ending ". Then you have any amount of characters, followed by the standard way to close an img tag. The concept of greediness and laziness is very important to learn in order to get the results you want when we actually get to using the patterns matched (that's the subject of the next tutorial, actually). Re-read this page as many times as it takes until you completely understand the concept. Pattern ModifiersRemember when I was talking about reasons why a regular expression must have delimiters? It's not only to contain the regular expression, but it's also to allow for use of special pattern modifiers that go after the ending delimiter. These allow you to modify how the Engine actually views your pattern. In this section, I'm going to go over every modifier that's commonly used in match regexes. In the next tutorial, when substitution is covered, I'll go over the modifiers that work for substitution, as there are some differences. The Insensitivity Modifier (i) /super/ That would match 'super', but not 'SuPeR' or 'SUPER'. In order to have it match all of those possibilities, you could use a lot of alternation or some clever character classes, or you could just apply the i modifier: /super/i Note how the i went AFTER the closing delimiter. The Newline Match Modifier (s) something //start STUFF! some more stuff... //end If you wanted everything between those two comments (//start and //end), you would write your regex like this: !//start(.+?)//end!s Now, pay close attention to what I did. Since I actually needed to use the / character in my pattern, it made no sense to use it as the delimiter, because then it would need to be escaped every time I used it (/\/\/start(.+?)\/\/end/s), and it's incredibly hard to read, so I used an exclamation point as the delimiter. The s modifier on the end allows the pattern to match all of that stuff in between the two comments even though they're on separate lines. The Multiline Mode Modifier (m) /^\d{5}$/mThe Freespace Modifier (x) /\b(\w\S+)(\s+\1)+\b/i Can be written as: / /b #word boundary (\w\S+) #word "chunk" ( \s+ #whitespace \1 #same word "chunk" ) + #repeat if necessary \b #boundary /xi Certainly more readable, right? Comments can be placed on the end of the line with a # and then your remarks. Where to Find More Modifiers PCRE vs. POSIXThis is more of a technical part of the tutorial that I felt should be covered (thanks zanus!). There are actually two "flavors", if you will, of regular expressions supported by PHP. They are Perl-Compatible Regular Expressions (PCRE) and POSIX Extended regular expressions. PCRE is much more robust than POSIX, and it can do so many more things that POSIX simply can't even come close to. They're VERY similar in syntax (for the most part, until you get to advanced syntax, which will be in a later tutorial), but PCRE has a lot more functionality. I thought that I'd outline some of those differences here, so you know why you should most certainly learn PCRE over POSIX. Binary data Speed Modifiers and Delimiter Deprecation in PHP6 Usability in Perl Putting it All TogetherWell, now that we have the basic syntax out of the way, I'm going to put some example patterns up for you to analyze, and then show you exactly what they do. /^\w+:(\s+\w+)\s+\d+$/m Match a word, a colon, a space, a word, a space, and some digits on every line /pro(b|n|r|l)ate/i Matches probate, pronate, prorate, prolate, without case sensitivity /^[+-]?\d+$/ Matches an integer which can have + or - (or even nothing) in front of it. It can also have leading zeros because 0 is included in \d. ~//start\n(.+?)\n//end~is Matches //start, a newline, any amount of characters on any amount of lines (//s modifier), a newline, and //end These are just some example patterns. When I get into advanced syntax, you'll be able to create much more intricate patterns, such as: /(\d)(\d{3})(?!\d)/
replacement: $1, $2I hope some of these basic patterns have gotten you more interested in pursuing regular expressions. ;) Conclusion and Future TutorialsThis tutorial was meant to be a (comprehensive) tutorial of the most basic regular expression syntax. There are still many more advanced concepts, but that will be the subject of maybe the 3rd or 4th tutorial in this set. The next tutorial will involve actually applying these patterns in PHP, creating matches and using the grouping metacharacters to create match groups, substitution, and some other concepts directly related to regex use in PHP. Then, I'll cover all of the advanced concepts in order to help you create more efficient and more specific patterns. Just to show you the utility of regular expressions, I actually had to use one to correct some of the tags that I used to show you the regular expressions. It looked a bit like this: !\[code(=php)?\](.+?)\[/code\]!is That found all of the code tags for me, so I could easily use a replacement pattern to make them into the proper tags. ;) |