kelvinluck.com

a stroke of luck

Hacking TXP into submission

Update:

This is a very old page imported from my previous blog. If there is missing content below or anything that doesn’t make sense then please check the page on my old blog.

So, as you will see from my previous two blog entries, under the misunderstanding that there was no plug-in to allow TXP to display nicely formatted code in-line I started work on a plug-in to do just that.

I’ve since found out that there was already a plug-in (glx_code) but as part of my learning experience with TXP I decided to push ahead and complete my plug-in (which uses the excellent GeSHi Generic Syntax Highlighter).

My problem last time was that I wanted my plug-in to allow you to place your code in-line within the tag like so:

<txp:krl_geshiSyntaxHighlight language=”php”>
function codeThatIsHighlighted() {}
</txp:krl_geshiSyntaxHighlight>

But – even if I wrapped the code in code or notextile tags then the over zealous Textile engine still replaced certain characters with their textile equivalent (so “__” on either side of something would make it italic). Some of the Textile engine was respecting my notextile tags though – the quotes were no longer being swapped out for their pretty html equivalent.

Thus started my trawl through the TXP source to figure out what was going on and how to stop this. My first mistake was that I presumed that the substitution was occurring whenever the page was rendered and was something I could control from within my plug-in. It took a fair amount of digging to track it down and find out that the Textile engine actually does a pass over an article as you save it and inserts the parsed text into the Body_html row of the textpattern table.

Once I had this figured out it became apparent I would need to make Textile (more) aware of notextile tags. It did seem to pay some attention to them and when I looked through the source (of textpattern/lib/classTextpattern.php) I found that the glyphs function was aware of notextile, pre, kbd and code tags and purposefully didn’t apply it’s transformations to them. However, the span function just applied it’s transformations regardless. So I borrowed some code from glyph and modified it slightly to work in the span function.

Here is my modified version of the span function:

function span($text)
{
$qtags = array('\*','\*\*','\?\?','-','__','_','%','\+','~');
// KL 2005-01-26 - Borrowed some code from the glyphs function to make span aware of notextile, pre, kbd and code tags
$codepre = false;
/*  if no html, do a simple search and replace... */
if (!preg_match("/&lt; .*&gt;/", $text)) {
foreach($qtags as $f) {
$text = preg_replace_callback("/
(?&lt; =^|\s|[[:punct:]]|[{([])
($f)
($this-&gt;c)
(?::(\S+))?
([\w&lt; &amp;].*[\w])
([[:punct:];]*)
$f
(?=[])}]|[[:punct:]]+|\s|$)
/xmU"
, array(&amp;$this, "fSpan"), $text);
}
return $text;
}
else {
// codepre = we are in a code / pre / kbd tag - don't replace the things from $glyph_search
// with their html alternatives but do replace other htmlspecialchars
// codepre2 = we are in notextile tags. That means NO textile. So leave everything - including
// the things from $glyph_search well alone...
$codepre = $codepre2 = false;
$text = preg_split("/(&lt;.*&gt;)/U", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach($text as $line) {
$offtags = ('code|pre|kbd');
$offtags2 = ('notextile');&lt;/code>

/*  matches are off if we're between &lt;code>, &lt;/code>
<pre> etc. */

            if (preg_match('/&lt; (' . $offtags . ')&gt;/i', $line)) $codepre = true;
            if (preg_match('/&lt; (' . $offtags2 . ')&gt;/i', $line)) $codepre2 = true;
            if (preg_match('/&lt; \/(' . $offtags . ')&gt;/i', $line)) $codepre = false;
            if (preg_match('/&lt; \/(' . $offtags2 . ')&gt;/i', $line)) $codepre2 = false;

            if (!$codepre &amp;&amp; !$codepre2) {
                foreach($qtags as $f) {
                    $line = preg_replace_callback("/
                        (?&lt; =^|\s|[[:punct:]]|[{([])
                        ($f)
                        ($this-&gt;c)
                        (?::(\S+))?
                        ([\w&lt; &amp;].*[\w])
                        ([[:punct:];]*)
                        $f
                        (?=[])}]|[[:punct:]]+|\s|$)
                    /xmU"
, array(&amp;$this, "fSpan"), $line);
                }
            }
            /* do htmlspecial if between <code> */
            if ($codepre &amp;&amp; !$codepre2) {
                $line = htmlspecialchars($line, ENT_NOQUOTES, "UTF-8");
                $line = preg_replace('/&lt;(\/?' . $offtags . ')&gt;/', "&lt; &gt;", $line);
            }

            $span_out[] = $line;
        }
        return join('', $span_out);
    }
}

I then noticed that there was still an issue with quotes being encoded when they were appearing within notextile tags when they shouldn’t have been… So I added this hack to the glyph function (as already illustrated in the span function above):

function glyphs($text)
{
// fix: hackish
$text = preg_replace('/"\z/', "\" ", $text);
$pnc = '[[:punct:]]';

$glyph_search = array(
'/([^\s[{(&gt;_*])?\'(?(1)|(?=\s|s\b|'.$pnc.'))/',      //  single closing
'
/\'/',                                              //  single opening
'/([^\s[{(&gt;_*])?"(?(1)|(?=\s|'.$pnc.'))/',           //  double closing
'/"/',                                               //  double opening
'/\b( )?\.{3}/',                                     //  ellipsis
'/\b([A-Z][A-Z0-9]{2,})\b(?:[(]([^)]*)[)])/',        //  3+ uppercase acronym
'/\s?--\s?/',                                        //  em dash
'/\s-\s/',                                           //  en dash
'/(\d+) ?x ?(\d+)/',                                 //  dimension sign
'/\b ?[([]TM[])]/i',                                 //  trademark
'/\b ?[([]R[])]/i',                                  //  registered
'/\b ?[([]C[])]/i');                                 //  copyright

$glyph_replace = array('’',   //  single closing
'‘',                          //  single opening
'”',                        //  double closing
'“',                          //  double opening
'…',                        //  ellipsis
'<acronym title=""></acronym>', //  3+ uppercase acronym
'—',                          //  em dash
' – ',                        //  en dash
'×',                       //  dimension sign
'™',                          //  trademark
'®',                           //  registered
'©');                          //  copyright

/*  if no html, do a simple search and replace... */
if (!preg_match("/&lt; .*&gt;/", $text)) {
$text = preg_replace($glyph_search, $glyph_replace, $text);
return $text;
}
else {
// codepre = we are in a code / pre / kbd tag - don't replace the things from $glyph_search
// with their html alternatives but do replace other htmlspecialchars
// codepre2 = we are in notextile tags. That means NO textile. So leave everything - including
// the things from $glyph_search well alone...
$codepre = $codepre2 = false;
$text = preg_split("/(&lt; .*&gt;)/U", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach($text as $line) {
$offtags = ('code|pre|kbd');
$offtags2 = ('notextile');

/*  matches are off if we're between &lt;code>, &lt;/code>, &lt;/pre>&lt;pre> etc. */
            if (preg_match('/&lt; (' . $offtags . ')&gt;/i', $line)) $codepre = true;
            if (preg_match('/&lt; (' . $offtags2 . ')&gt;/i', $line)) $codepre2 = true;
            if (preg_match('/&lt; \/(' . $offtags . ')&gt;/i', $line)) $codepre = false;
            if (preg_match('/&lt; \/(' . $offtags2 . ')&gt;/i', $line)) $codepre2 = false;
            if (!preg_match("/&lt; .*&gt;/", $line) &amp;&amp; !$codepre &amp;&amp; !$codepre2) {
                $line = preg_replace($glyph_search, $glyph_replace, $line);
            }

            /* do htmlspecial if between &lt;/code>&lt;code> */
            if ($codepre &amp;&amp; !$codepre2) {
                $line = htmlspecialchars($line, ENT_NOQUOTES, "UTF-8");
                $line = preg_replace('/&lt;(\/?' . $offtags . ')&gt;/', "&lt; &gt;", $line);
            }

            $glyph_out[] = $line;
        }
        return join('', $glyph_out);
    }
}

[note: line numbers are after the above hack has been applied - you should be able to find the right function to replace anyway]

With these hacks in place I am finding that my plug-in is able to work more or less how I want it to… I am interested to find out if people think that these hacks will break other functionality or incur a performance hit or if there was a reason that textile ignored notextile tags when replacing “span” style tags in the first place.

Anyone who can answer those questions or can suggest a better way I can solve my problem please leave a comment and let me know… I’d rather not be hacking the TXP core – especially with version 1 hopefully out soon. But it seems nice to be able to drop little highlighted code snippets into your blog without requiring any uploading of files or anything…

2 Comments, Comment or Ping

Reply to “Hacking TXP into submission”