kelvinluck.com

a stroke of luck

krl_geshiSyntaxHighlight released


Update:

This is a very old page imported from my previous blog. If there is missing content below or anything that doesn’t make sense then please check the page on my old blog.

So – it’s finally deserving of a 0.1 release… There are still a few little teething problems but in general the krl_geshiSyntaxHighlight plugin is ready to go. For downloads, installation instructions please check out my project page for it, here.
Please leave any comments or suggestions for improvement at the bottom of this article…



Hacking TXP into submission


Update:

This is a very old page imported from my previous blog. If there is missing content below or anything that doesn’t make sense then please check the page on my old blog.

So, as you will see from my previous two blog entries, under the misunderstanding that there was no plug-in to allow TXP to display nicely formatted code in-line I started work on a plug-in to do just that.

I’ve since found out that there was already a plug-in (glx_code) but as part of my learning experience with TXP I decided to push ahead and complete my plug-in (which uses the excellent GeSHi Generic Syntax Highlighter).

My problem last time was that I wanted my plug-in to allow you to place your code in-line within the tag like so:

<txp:krl_geshiSyntaxHighlight language=”php”>
function codeThatIsHighlighted() {}
</txp:krl_geshiSyntaxHighlight>

But – even if I wrapped the code in code or notextile tags then the over zealous Textile engine still replaced certain characters with their textile equivalent (so “__” on either side of something would make it italic). Some of the Textile engine was respecting my notextile tags though – the quotes were no longer being swapped out for their pretty html equivalent.

Thus started my trawl through the TXP source to figure out what was going on and how to stop this. My first mistake was that I presumed that the substitution was occurring whenever the page was rendered and was something I could control from within my plug-in. It took a fair amount of digging to track it down and find out that the Textile engine actually does a pass over an article as you save it and inserts the parsed text into the Body_html row of the textpattern table.

Once I had this figured out it became apparent I would need to make Textile (more) aware of notextile tags. It did seem to pay some attention to them and when I looked through the source (of textpattern/lib/classTextpattern.php) I found that the glyphs function was aware of notextile, pre, kbd and code tags and purposefully didn’t apply it’s transformations to them. However, the span function just applied it’s transformations regardless. So I borrowed some code from glyph and modified it slightly to work in the span function.

Here is my modified version of the span function:

function span($text)
{
$qtags = array('\*','\*\*','\?\?','-','__','_','%','\+','~');
// KL 2005-01-26 - Borrowed some code from the glyphs function to make span aware of notextile, pre, kbd and code tags
$codepre = false;
/*  if no html, do a simple search and replace... */
if (!preg_match("/&lt; .*&gt;/", $text)) {
foreach($qtags as $f) {
$text = preg_replace_callback("/
(?&lt; =^|\s|[[:punct:]]|[{([])
($f)
($this-&gt;c)
(?::(\S+))?
([\w&lt; &amp;].*[\w])
([[:punct:];]*)
$f
(?=[])}]|[[:punct:]]+|\s|$)
/xmU"
, array(&amp;$this, "fSpan"), $text);
}
return $text;
}
else {
// codepre = we are in a code / pre / kbd tag - don't replace the things from $glyph_search
// with their html alternatives but do replace other htmlspecialchars
// codepre2 = we are in notextile tags. That means NO textile. So leave everything - including
// the things from $glyph_search well alone...
$codepre = $codepre2 = false;
$text = preg_split("/(&lt;.*&gt;)/U", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach($text as $line) {
$offtags = ('code|pre|kbd');
$offtags2 = ('notextile');&lt;/code>

/*  matches are off if we're between &lt;code>, &lt;/code>
<pre> etc. */

            if (preg_match('/&lt; (' . $offtags . ')&gt;/i', $line)) $codepre = true;
            if (preg_match('/&lt; (' . $offtags2 . ')&gt;/i', $line)) $codepre2 = true;
            if (preg_match('/&lt; \/(' . $offtags . ')&gt;/i', $line)) $codepre = false;
            if (preg_match('/&lt; \/(' . $offtags2 . ')&gt;/i', $line)) $codepre2 = false;

            if (!$codepre &amp;&amp; !$codepre2) {
                foreach($qtags as $f) {
                    $line = preg_replace_callback("/
                        (?&lt; =^|\s|[[:punct:]]|[{([])
                        ($f)
                        ($this-&gt;c)
                        (?::(\S+))?
                        ([\w&lt; &amp;].*[\w])
                        ([[:punct:];]*)
                        $f
                        (?=[])}]|[[:punct:]]+|\s|$)
                    /xmU"
, array(&amp;$this, "fSpan"), $line);
                }
            }
            /* do htmlspecial if between <code> */
            if ($codepre &amp;&amp; !$codepre2) {
                $line = htmlspecialchars($line, ENT_NOQUOTES, "UTF-8");
                $line = preg_replace('/&lt;(\/?' . $offtags . ')&gt;/', "&lt; &gt;", $line);
            }

            $span_out[] = $line;
        }
        return join('', $span_out);
    }
}

I then noticed that there was still an issue with quotes being encoded when they were appearing within notextile tags when they shouldn’t have been… So I added this hack to the glyph function (as already illustrated in the span function above):

function glyphs($text)
{
// fix: hackish
$text = preg_replace('/"\z/', "\" ", $text);
$pnc = '[[:punct:]]';

$glyph_search = array(
'/([^\s[{(&gt;_*])?\'(?(1)|(?=\s|s\b|'.$pnc.'))/',      //  single closing
'
/\'/',                                              //  single opening
'/([^\s[{(&gt;_*])?"(?(1)|(?=\s|'.$pnc.'))/',           //  double closing
'/"/',                                               //  double opening
'/\b( )?\.{3}/',                                     //  ellipsis
'/\b([A-Z][A-Z0-9]{2,})\b(?:[(]([^)]*)[)])/',        //  3+ uppercase acronym
'/\s?--\s?/',                                        //  em dash
'/\s-\s/',                                           //  en dash
'/(\d+) ?x ?(\d+)/',                                 //  dimension sign
'/\b ?[([]TM[])]/i',                                 //  trademark
'/\b ?[([]R[])]/i',                                  //  registered
'/\b ?[([]C[])]/i');                                 //  copyright

$glyph_replace = array('’',   //  single closing
'‘',                          //  single opening
'”',                        //  double closing
'“',                          //  double opening
'…',                        //  ellipsis
'<acronym title=""></acronym>', //  3+ uppercase acronym
'—',                          //  em dash
' – ',                        //  en dash
'×',                       //  dimension sign
'™',                          //  trademark
'®',                           //  registered
'©');                          //  copyright

/*  if no html, do a simple search and replace... */
if (!preg_match("/&lt; .*&gt;/", $text)) {
$text = preg_replace($glyph_search, $glyph_replace, $text);
return $text;
}
else {
// codepre = we are in a code / pre / kbd tag - don't replace the things from $glyph_search
// with their html alternatives but do replace other htmlspecialchars
// codepre2 = we are in notextile tags. That means NO textile. So leave everything - including
// the things from $glyph_search well alone...
$codepre = $codepre2 = false;
$text = preg_split("/(&lt; .*&gt;)/U", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach($text as $line) {
$offtags = ('code|pre|kbd');
$offtags2 = ('notextile');

/*  matches are off if we're between &lt;code>, &lt;/code>, &lt;/pre>&lt;pre> etc. */
            if (preg_match('/&lt; (' . $offtags . ')&gt;/i', $line)) $codepre = true;
            if (preg_match('/&lt; (' . $offtags2 . ')&gt;/i', $line)) $codepre2 = true;
            if (preg_match('/&lt; \/(' . $offtags . ')&gt;/i', $line)) $codepre = false;
            if (preg_match('/&lt; \/(' . $offtags2 . ')&gt;/i', $line)) $codepre2 = false;
            if (!preg_match("/&lt; .*&gt;/", $line) &amp;&amp; !$codepre &amp;&amp; !$codepre2) {
                $line = preg_replace($glyph_search, $glyph_replace, $line);
            }

            /* do htmlspecial if between &lt;/code>&lt;code> */
            if ($codepre &amp;&amp; !$codepre2) {
                $line = htmlspecialchars($line, ENT_NOQUOTES, "UTF-8");
                $line = preg_replace('/&lt;(\/?' . $offtags . ')&gt;/', "&lt; &gt;", $line);
            }

            $glyph_out[] = $line;
        }
        return join('', $glyph_out);
    }
}

[note: line numbers are after the above hack has been applied - you should be able to find the right function to replace anyway]

With these hacks in place I am finding that my plug-in is able to work more or less how I want it to… I am interested to find out if people think that these hacks will break other functionality or incur a performance hit or if there was a reason that textile ignored notextile tags when replacing “span” style tags in the first place.

Anyone who can answer those questions or can suggest a better way I can solve my problem please leave a comment and let me know… I’d rather not be hacking the TXP core – especially with version 1 hopefully out soon. But it seems nice to be able to drop little highlighted code snippets into your blog without requiring any uploading of files or anything…

Plugin to add GeSHi syntax highlighting to Textpattern – Part II


Update 2:

This is a very old page imported from my previous blog. If there is missing content below or anything that doesn’t make sense then please check the page on my old blog.

Update:

see the krl_geshiSyntaxHighlight project page for more information.

…continued from yesterdays post.

EDIT: Please note that the code samples in this look better than they did when I wrote it because I have since updated the engine which generates them (as you will see if you read my other entries) – so when I talk about quotes and underscores and the like being changed into wierd HTML entities it was true!

OK – thanks to some people on the textpattern forums I’ve managed to half fix the problem from yesterday. By placing <notextile> tags around the code in my example I now have the quotes not being changed into strange HTML entities. As you can see here:

  1. function GeSHi ($source, $language, $path = '')
  2. {
  3.   if ($path == '') {
  4.     $path = dirname( __FILE__ ).'/geshi/';
  5.   }

Unfortunately the FILE is still getting translated into first italic tags and then &lt;i&gt;FILE&lt;/i&gt;.

While I was wondering if there was a way to avoid this I came across Johan Nilsson’s glx_code plugin…

How comes I didn’t find it yesterday when I was looking for a way to highlight my code? Oh well – I’ve gone far enough along my path now that I need to complete it! And the way that the glx_code plugin works gave me an idea. It avoids any issues with textile by loading the file with the code sample in rather than by having it included inside the page (as my approach did). So I thought I would add the option to do this to my plugin. And here is the result marking up the above bit of code:

  1. function GeSHi ($source, $language, $path = '')
  2. {
  3.   if ($path == '') {
  4.     $path = dirname( __FILE__ ).'/geshi/';
  5. }

Sweet – that seems to work nicely :D I’m not sure it is a perfect solution because it requires creating a file on my server for each code snippet I want to highlight but it is at least functional. I’ll leave it like that for now and see if I can find a way to get the contents of the tag without textile getting it’s grubby mits on it first!

And I’ll be back soon with another installment which will hopefully contain the final plugin and instructions for it’s use…



Plugin to add GeSHi syntax highlighting to Textpattern


Update 2:

This is a very old page imported from my previous blog. If there is missing content below or anything that doesn’t make sense then please check the page on my old blog.

Update:

see the krl_geshiSyntaxHighlight project page for more information.

So I have this new blog thing and it occured to me that one thing I would want to do with it is to post code I’ve written. I had a look around and it didn’t seem like there was a plugin available to do syntax highlighting for code posted to Textpattern. So I thought that it would be a good quick project for me to get to know abit more about Textpattern to try and add the syntax highlighting capabilities of GeSHi to it.

And thats exactly what I’ve done! It’s functional but not quite perfect at the moment and it’s past my bedtime so this is as far as I got…

First download GeSHi from here (I got 1.0.4 – the latest at the time of writing). Then extract it so that the geshi.php file is somewhere in your include path and so that the geshi folder (which contains all the language files) sits at the same level as this file.

Then edit the geshi.php file so that the relevant lines look like this:

function GeSHi ($source, $language, $path = '')
{
if ($path == '') {
$path = dirname( __FILE__ ).'/geshi/';
}

D’oh! Just looking at that I have noticed that there is a problem with where htmlentities is called within GeSHi. As you can see, the quote marks have got all messed up in the code above. So I’m going to have to leave it for the moment I think and return to getting this polished along with providing the plugin tomorrow…