PHP RegEx Help...

Discussion in 'Web Design & Coding' started by SPeedY_B, Mar 14, 2008.

  1. SPeedY_B

    SPeedY_B I may actually be insane.

    Messages:
    15,800
    Location:
    Midlands, England
    I have a full HTML document stored in a variable named $buffer, within the document there are multiple instances of the following (The asterisks in the line above are random numbers.) :
    Code:
    <a href="[b]showMessage.do?mmsId=********&inboxItemId=********[/b]" title="View message">
    I need to grab just the part in bold. I gather the easiest way of doing this is with preg_match_all(); ? Though as I'm useless with the expressions side of things, I could do with some help.
     
  2. LordOfLA

    LordOfLA Godlike!

    Messages:
    7,027
    Location:
    Maidenhead, Berkshire, UK
    /showMessage\.do\?mmsId\=(0-9)*\&inboxItemID\=(0-9)*/

    pop that in and see what you get.. I'm not too hot at regex without a guide of syntax but that should get you started.
     
  3. Geffy

    Geffy Moderator Folding Team

    Messages:
    7,805
    Location:
    United Kingdom
    Yep LordOfLA is sort of on the right track, cept you said you wanted to capture the href portion not just the IDs

    Code:
    /href=["'](showMessage\.do\?mmsId=[0-9]+\&inboxItemID\=[0-9]+)["']/
    This will catch the URL provided its in an href attribute and wrapped by either single or double quotes

    PHP:
    preg_match_all('/href=["\'](showMessage\.do\?mmsId=[0-9]+\&inboxItemID\=[0-9]+)["\']/'$buffer$matches);
    then all the matches should be found in the $matches array, just print_r the var and see what you get.

    Alternatively if you are in PHP5 then you could process it as XML and use an XPath string to get what you want.
     
    Last edited by a moderator: Mar 15, 2008
  4. SPeedY_B

    SPeedY_B I may actually be insane.

    Messages:
    15,800
    Location:
    Midlands, England
    Ok, using this:
    PHP:
    preg_match_all('/href=["\'](showMessage\.do\?mmsId=[0-9]+\&inboxItemID\=[0-9]+)["\']/'$buffer$matches);
    print_r($matches);
    Produces this:
    Code:
    Array
    (
        [0] => Array
            (
            )
    
        [1] => Array
            (
            )
    
    )
    I've made the $buffer print out at the top of the page before using the preg function so I can check what's there manually, and there are four instances (Two unique, printed twice each) of the string I mentioned, so I'm not sure why it's producing what looks like empty arrays?
     
  5. SPeedY_B

    SPeedY_B I may actually be insane.

    Messages:
    15,800
    Location:
    Midlands, England
    Just tried this...
    PHP:
    preg_match_all("/<table>[\s\w\/<>=\\\"]*<\/table>/"$buffer$matches);
    ...as there's only a single table in the entire page. Yet it doesn't produce the expected result.

    When shoving in...
    PHP:
    $buffer 'Hello, rah rah. <table>Niff this is <b>a test</b></table> nooch nooch';
    I'm returned exactly what I'd expect. Not quite sure what's going on.
     
  6. X-Istence

    X-Istence * Political User

    Messages:
    6,498
    Location:
    USA
  7. SPeedY_B

    SPeedY_B I may actually be insane.

    Messages:
    15,800
    Location:
    Midlands, England
    I've deviated into a separate test file now as previously I was grabbing the required page with cURL, so I've snipped some of the page containing bits of the output I need.
    PHP:
    <?php
    $page 
    = <<<EOF

            Start of document removed.

            <table>
                <tr class="tableheadCheckbox">
                    <th> </th>
                    <th>Preview</th>




                    <th><a href="showInbox.do?sortBy=subject&sortOrder=asc"
                           title="Click here to sort by subject">Subject</a></th>



                    <th><a href="showInbox.do?sortBy=from&sortOrder=asc"
                           title="Click here to sort by originator">From</a></th>



                    <th><a href="showInbox.do?sortBy=received&sortOrder=asc"
                           title="Click here to sort by receive date"><img src="images/tri1.gif" alt="Down arrow to show ascending sort order" /> Received</a></th>



                    <th class="noBorderRight"><a href="showInbox.do?sortBy=expires&sortOrder=desc"
                           title="Click here to sort by expiration date">Expires</a></th>


                </tr>

                <tbody>


                    <tr class="tableheadCheckbox">
                        <td>
                            <div class="moduleFrmBox">
                                <input type="checkbox" name="selectedItems" value="22222222" title="Check this message">
                            </div>
                        </td>
                        <td>
                  <a href="showMessage.do?mmsId=11111111&inboxItemId=22222222" title="View message">

                  <img src="http://139.2.165.14/MacsService/Macs/ContentService/-removed-.jpg" alt="Thumbnail preview of user submitted image" border="0" />

                  </a>
                        </td>
                        <td>


                  <a href="showMessage.do?mmsId=11111111&inboxItemId=22222222" title="View message">


                    Har


                          </a>

                        </td>
                        <td>

                          +4400000001

                        </td>
                        <td>




                          2008/03/14 19:17

                        </td>
                        <td class="noBorderRight">

                                2008/04/13 20:17

                        </td>
                    </tr>

                    <tr class="tableheadCheckbox">
                        <td>
                            <div class="moduleFrmBox">
                                <input type="checkbox" name="selectedItems" value="33333333" title="Check this message">
                            </div>
                        </td>
                        <td>
                  <a href="showMessage.do?mmsId=52546530&inboxItemId=33333333" title="View message">

                  <img src="http://139.2.165.14/MacsService/Macs/ContentService/-removed-.jpg" alt="Thumbnail preview of user submitted image" border="0" />

                  </a>
                        </td>
                        <td>


                  <a href="showMessage.do?mmsId=52546530&inboxItemId=33333333" title="View message">


                    FW:


                          </a>

                        </td>
                        <td>

                          +440000000000

                        </td>
                        <td>




                          2008/03/10 18:56

                        </td>
                        <td class="noBorderRight">

                                2008/04/09 19:56

                        </td>
                    </tr>


                </tbody>
            </table>
            
            End of document removed.

    EOF;
        
    $lines explode("\n"$page);     

        echo 
    '<textarea style="width:110em;height:400px;">'print_r($lines); echo '</textarea><br />';


        
    preg_match_all('/href=["\'](showMessage\.do\?mmsId=[0-9]+\&inboxItemID\=[0-9]+)["\']/'$line$matches);


        
    preg_match_all("/<table>[\s\w\/<>=\\\"]*<\/table>/m"$page$table_matches);

        echo 
    '<strong>ID, etc...</strong><pre>'print_r($matches); echo '</pre> <strong>Tables...</strong><pre>'print_r($table_matches); echo '</pre>';
    ?>
    Results in the following
    Code:
    ID, etc...
    Array
    (
        [0] => Array
            (
            )
    
        [1] => Array
            (
            )
    
    )
    Tables...
    Array
    (
        [0] => Array
            (
            )
    
    )
     
  8. X-Istence

    X-Istence * Political User

    Messages:
    6,498
    Location:
    USA
    I'll take a look at it when i wake up tomorrow morning. Just came back from sodering a whole bunch of LED's.
     
  9. SPeedY_B

    SPeedY_B I may actually be insane.

    Messages:
    15,800
    Location:
    Midlands, England
    Ta muchly, very appreciated. :)
     
  10. SPeedY_B

    SPeedY_B I may actually be insane.

    Messages:
    15,800
    Location:
    Midlands, England
    *prod*

    Have you had a chance to look at this yet, X?

    Edit: Actually, there's no rush, seems O2 have changed the way in which their site works. Seems to be totally ****ing broke for me at the moment, so this script is a bit useless at present. :(

    Edit2: I take it back, sorted it again :p
     
    Last edited: Apr 1, 2008