Syntax highlighter for C++

X-Istence · 13 Nov 2008

Please see this post I posted for the most up-to-date sed script: http://forum.osnn.net/showpost.php?p=856765&postcount=8

--

I am a sick sick person, and I love regex's

. I have written a rather quick and dirty C++ parser that does a few things (I am using this to create my Portfolio website). This is a sed script, which goes through the C++ files it is handed.

It also does some other cool things, for example I wanted to embed links in my C++ source files, so that on the web page those come out, what I figured I would do is take a page from the Markdown idea.

Code:

{Text here}[url here]

Allows this script to create links that are embedded in the C++ source code files. I figured I'd post it here.

Code:

#!/usr/bin/sed -E -f 

# Remove anything HTML is going to hate us for
s/</\&lt;/g
s/>/\&gt;/g

# Multiline comments

/\/\*\*/,/\*\*\//{

	# Replace the first instance with a span
	/\/\*\*/c\
	<span class="comment">/**

	# Close the span
	/\*\*\//c\
	**/</span>

	# We parse URL's in comments, but nothing else!
	b url
}

/\/\// {
	s/(\/\/ .*)/<span class=\"comment\">\1<\/span>/
	
	b url
}

# Pre-processor
/^#.*/ {
	# We add a span to syntax highlight it
	s/(#.*)/<span class=\"preproc\">\1<\/span>/
	
	# We want to make links out of the headers we have written
	s/#include \"(.*)\"/#include "<a href=\"\.\/\1tml\" alt=\"\1\">\1<\/a>"/
	
	# Only process URL's, nothing else!
	b url
}

# Replace text within quotes
s/\"([^"]*)\"/\"<span class=\"text\">\1<\/span>\"/g

# We syntax highlight the standard library stuff (not sure if I want to turn this on)

# s/(std::[^ (]+)/<span class=\"keyword\">\1<\/span>/g

# Types
s/(int)/<span class=\"keyword\">\1<\/span>/g
s/(char)/<span class=\"keyword\">\1<\/span>/g
s/(struct)/<span class=\"keyword\">\1<\/span>/g

# Keywords
s/(switch)/<span class=\"keyword\">\1<\/span>/g
s/(case)/<span class=\"keyword\">\1<\/span>/g
s/(default)/<span class=\"keyword\">\1<\/span>/g
s/(new)/<span class=\"keyword\">\1<\/span>/g
s/(delete)/<span class=\"keyword\">\1<\/span>/g
s/(typedef)/<span class=\"keyword\">\1<\/span>/g
s/(return)/<span class=\"keyword\">\1<\/span>/g
s/(public:)/<span class=\"keyword\">\1<\/span>/g
s/(private:)/<span class=\"keyword\">\1<\/span>/g
s/(protected:)/<span class=\"keyword\">\1<\/span>/g
s/(const)/<span class=\"keyword\">\1<\/span>/g
s/(friend)/<span class=\"keyword\">\1<\/span>/g

# This one is special, if we are not careful we also match class= from all the above <span>'s.
s/(class )/<span class=\"keyword\">class<\/span> /g

: url
# Replace URL's {Text here}[URL here]
s/\{([^\{]*)\}\[([^\[]*)\]/<a href=\"\2\" alt=\"\1\">\1<\/a>/g

LordOfLA · 13 Nov 2008

thats evil

Care to explain the regexes for those of us not versred in them. Also those of us under the influence of alcohol

X-Istence · 13 Nov 2008

Do I have too?

Maybe later, I am working on some stuff.

Geffy · 13 Nov 2008

Which ones? Most of them aren't that complicated. Can always read regular-expression.info to get a better grasp of regexp

X-Istence · 18 Nov 2008

Code:

#!/usr/bin/sed -E -f 

# Remove anything HTML is going to hate us for
s/</\&lt;/g
s/>/\&gt;/g

# Multiline comments

/\/\*\*/,/\*\*\//{

	# Replace the first instance with a span
	/\/\*\*/c\
	<span class="comment">/**

	# Close the span
	/\*\*\//c\
	**/</span>

	# We parse URL's in comments, but nothing else!
	b url
}

/\/\// {
	s/(\/\/ .*)/<span class="comment">\1<\/span>/
	
	b url
}

# Pre-processor
/^#.*/ {
	# We add a span to syntax highlight it
	s/(#.*)/<span class="preproc">\1<\/span>/
	
	# We want to make links out of the headers we have written
	s/#include \"(.*)\"/#include "<a href="\.\/\1tml" alt="\1">\1<\/a>"/
	
	# Process no more
	b end
}

# Literal strings, we want to highlight them, but there is a catch
# in C++ we are allowed to start a literal string on one line, and then continue it on the next line
# this means we need to make sure we parse that correctly!

/"/ {

	# Are both the opening and closing quote on the same line? If so, branch to noloop
	/"[^"]*"/b noloop
	
	# No, they are apparently not. This means we replace the quote with the correct span tag
	s/"([^"]*)/"<span class="text">\1/
	
	# Label the loop
	: loop
	
	# Output the text to stdout, as normal
	n
	
	# Did we find another quote character yet? If not, we branch to label loop
	/"/!b loop
	
	# Ah, we found it. Replace it with the correct span tag.
	s/"/<\/span>"/
	
	# Parse only URL's in string literals.
	b url
	
	# Label the noloop branch
	: noloop
	
	# Search and replace the text
	s/"([^"]*)"/"<span class="text">\1<\/span>"/g
	
	# Parse only URL's in string literals.
	b url
}

# We syntax highlight the standard library stuff (not sure if I want to turn this on)

# s/(std::[^ (]+)/<span class=\"keyword\">\1<\/span>/g

# Types
s/(int)/<span class="keyword">\1<\/span>/g
s/(char)/<span class="keyword">\1<\/span>/g
s/(struct)/<span class="keyword">\1<\/span>/g

# Keywords
s/(switch)/<span class="keyword">\1<\/span>/g
s/(case)/<span class="keyword">\1<\/span>/g
s/(default)/<span class="keyword">\1<\/span>/g
s/(new)/<span class="keyword">\1<\/span>/g
s/(delete)/<span class="keyword">\1<\/span>/g
s/(typedef)/<span class="keyword">\1<\/span>/g
s/(return)/<span class="keyword">\1<\/span>/g
s/(const)/<span class="keyword">\1<\/span>/g
s/(friend)/<span class="keyword">\1<\/span>/g

s/(public:)/<span class="keyword">\1<\/span>/g
s/(private:)/<span class="keyword">\1<\/span>/g
s/(protected:)/<span class="keyword">\1<\/span>/g

# This one is special, if we are not careful we also match class= from all the above <span>'s.
s/(class )/<span class="keyword">class<\/span> /g

: url
# Replace URL's {Text here}[URL here]
s/\{([^\{]*)\}\[([^\[]*)\]/<a href="\2" alt="\1">\1<\/a>/g

: end

This version is a little bit more robust, as I added some code to deal with string literals. I know of one edge case that won't properly be parsed, does anyone see it? I will point out that it is with the string literal parsing.

X-Istence · 18 Nov 2008

So, yeah ... that version had a few flaws in it, which I only noticed as I pushed on towards bigger and better things. Off course.

Code:

#!/usr/bin/sed -E -f 

# Remove anything HTML is going to hate us for
s/</\&lt;/g
s/>/\&gt;/g

# Multiline comments

/\/\*\*/,/\*\*\//{

	# Replace the first instance with a span
	/\/\*\*/c\
	<span class="comment">/**

	# Close the span
	/\*\*\//c\
	**/</span>

	# We parse URL's in comments, but nothing else!
	b url
}

/\/\// {
	s/(\/\/ .*)/<span class="comment">\1<\/span>/
	
	b url
}

# Pre-processor
/^#.*/ {
	# We add a span to syntax highlight it
	s/(#.*)/<span class="preproc">\1<\/span>/
	
	# We want to make links out of the headers we have written
	s/#include \"(.*)\"/#include "<a href="\.\/\1tml" alt="\1">\1<\/a>"/
	
	# Process no more
	b end
}

# We syntax highlight the standard library stuff (not sure if I want to turn this on)

# s/(std::[^ (]+)/<keyword>\1<\/keyword>/g

# Types
s/(int)/<keyword>\1<\/keyword>/g
s/(char)/<keyword>\1<\/keyword>/g
s/(struct)/<keyword>\1<\/keyword>/g

# Keywords
s/(switch)/<keyword>\1<\/keyword>/g
s/(case)/<keyword>\1<\/keyword>/g
s/(default)/<keyword>\1<\/keyword>/g
s/(new)/<keyword>\1<\/keyword>/g
s/(delete)/<keyword>\1<\/keyword>/g
s/(typedef)/<keyword>\1<\/keyword>/g
s/(return)/<keyword>\1<\/keyword>/g
s/(const)/<keyword>\1<\/keyword>/g
s/(friend)/<keyword>\1<\/keyword>/g

s/(public:)/<keyword>\1<\/keyword>/g
s/(private:)/<keyword>\1<\/keyword>/g
s/(protected:)/<keyword>\1<\/keyword>/g

# This one is special, if we are not careful we also match class= from all the above <span>'s.
s/(class )/<keyword>\1<\/keyword> /g

# Literal strings, we want to highlight them, but there is a catch
# in C++ we are allowed to start a literal string on one line, and then continue it on the next line
# this means we need to make sure we parse that correctly!

/"/ {

	# Are both the opening and closing quote on the same line? If so, branch to noloop
	/"[^"]*"/b noloop
	
	# No, they are apparently not. This means we replace the quote with the correct span tag
	s/"([^"]*)/<text>"\1/
	
	# Label the loop
	: loop
	
	# Append the next line from the input file to the current line, move cursor forward by one
	N
	
	# Did we find another quote character yet? If not, we branch to label loop
	/"[^"]+"/!b loop
	
	# Ah, we found it. Replace it with the correct span tag.
	s/("[^"]+)"/\1"<\/text>/
	
	b endquote
	
	# Label the noloop branch
	: noloop
	
	s/"([^"]*)"/<text>"\1"<\/text>/g
	
	: endquote
	
	# String literals should not contain "syntax" highlighted code. So we remove all keyword tags from them
	# Label removetags
	: removetags

	# Remove <keyword> and </keyword> from the source file
	s/("[^<]+)<keyword>([^"]+")/\1\2/g
	s/("[^<]+)<\/keyword>([^"]+")/\1\2/g

	# Check if there are any more keyword tags left in this part of the string literal
	# if so, we branch back to removetags. We basically loop until this condition returns false.
	/"[^"]+<keyword>[^"]+"/b removetags
}

: url
# Replace URL's {Text here}[URL here]
s/\{([^\}]*)\}\[([^\[]*)\]/<a href="\2" alt="\1">\1<\/a>/g

: end

# Replace <keyword> and </keyword> with their span equivalent
s/<keyword>/<span class="keyword">/g
s/<\/keyword>/<\/span>/g
s/<text>/<span class="text">/g
s/<\/text>/<\/span>/g

New and improved version!

osnnraptor · 11 Dec 2008

for wordpress i use wp-syntax which is based on geshi - http://qbnz.com/highlighter/

X-Istence · 23 Feb 2009

Code:

#!/usr/bin/sed -E -f 

###
 # Copyright (c) 2009 Bert JW Regeer <xistence@0x58.com>
 #
 # Permission to use, copy, modify, and distribute this software for any
 # purpose with or without fee is hereby granted, provided that the above
 # copyright notice and this permission notice appear in all copies.
 #
 # THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 # WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 # MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 # ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 # WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 # ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 # OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 #
##

# Remove anything HTML is going to hate us for
s/</\&lt;/g
s/>/\&gt;/g

# Pre-processor
/^#.*/ {
	# We add a span to syntax highlight it
	s/(#.*)/<span class="preproc">\1<\/span>/
	
	# We want to make links out of the headers we have written
	s/#include \"(.*)\"/#include "<a href="\.\/\1tml" alt="\1">\1<\/a>"/
	
	# For pre-processor directives we do no other processing what so ever!
	b
}

# We syntax highlight the standard library stuff (not sure if I want to turn this on)

# s/(std::[^ (]+)/<keyword>\1<\/keyword>/g

# Types
s/(int)/<keyword>\1<\/keyword>/g
s/(char)/<keyword>\1<\/keyword>/g
s/(struct)/<keyword>\1<\/keyword>/g

# Keywords
s/(switch)/<keyword>\1<\/keyword>/g
s/(case)/<keyword>\1<\/keyword>/g
s/(default)/<keyword>\1<\/keyword>/g
s/(new)/<keyword>\1<\/keyword>/g
s/(delete)/<keyword>\1<\/keyword>/g
s/(typedef)/<keyword>\1<\/keyword>/g
s/(return)/<keyword>\1<\/keyword>/g
s/(const)/<keyword>\1<\/keyword>/g
s/(friend)/<keyword>\1<\/keyword>/g

s/(public:)/<keyword>\1<\/keyword>/g
s/(private:)/<keyword>\1<\/keyword>/g
s/(protected:)/<keyword>\1<\/keyword>/g

# This one is special, if we are not careful we also match class= from all the above <span>'s.
s/(class )/<keyword>\1<\/keyword> /g

# Multiline comments
# Issues:
#
# It is valid C/C++ to do this:
#
# /* this is a comment */ myclass = new myclass(); /* comment again */
# this parse has one hell of a greedy regular expression, if you can figure out a way to make it not-greedy, you sir are a god
# back to the issue
# instead of turning that into the following <comment> <code> <comment> it becomes <comment>. Yes, the entire line
# is now a comment. That is bad. So don't use multiple comments on the same line, and you will be fine!


/\/\*/ {
	# Are both the opening and closing quote on the same line? If so, branch to noloop
	/\/\*([^\*][^\/]+)*\*\//b cnoloop
	
	# No, they are apparently not. This means we replace the quote with the correct span tag
	# s/(\/\*.*$)/<comment>\1/
	
	# Label the loop
	: cloop
	
	# Append the next line from the input file to the current line, move cursor forward by one
	N
	
	# Did we find another quote character yet? If not, we branch to label loop
	/\/\*.*\*\//!b cloop
	
	# Label the noloop branch
	: cnoloop
	
	s/(\/\*.*\*\/)/<comment>\1<\/comment>/g
	
	: endcomment

	: cremovetags
	
	s/((\/\*)[^<]+)<keyword>(.*\*\/)/\1\3/g
	s/((\/\*)[^<]+)<\/keyword>(.*\*\/)/\1\3/g
	
	/\/\*[^<]+<keyword>([^\*][^\/]+)*\*\//b cremovetags
	
	# We don't want to process literal strings
	
	b end
}

/\/\/ .*/ {
	s/(\/\/ .*)/<comment>\1<\/comment>/
	
	: cpremovetags

	s/(\/\/ [^<]*)<keyword>(.*)$/\1\2/g
	s/(\/\/ [^<]*)<\/keyword>(.*)$/\1\2/g
	
	/\/\/ [^<]*<keyword>.*$/b cpremovetags
	
	# We don't want to process literal strings
	
	b end
}

# Literal strings, we want to highlight them, but there is a catch
# in C++ we are allowed to start a literal string on one line, and then continue it on the next line
# this means we need to make sure we parse that correctly!

/"/ {
	# Are both the opening and closing quote on the same line? If so, branch to noloop
	/"[^"]*"/b qnoloop
	
	# Label the loop
	: qloop
	
	# Append the next line from the input file to the current line, move cursor forward by one
	N
	
	# Did we find another quote character yet? If not, we branch to label loop
	/"[^"]+"/!b qloop
	
	# Label the noloop branch
	: qnoloop
	
	# String literals should not contain "syntax" highlighted code. So we remove all keyword tags from them
	# Label removetags
	: qremovetags

	# Remove <keyword> and </keyword> from the source file
	s/("[^<]*)<keyword>([^"]+")/\1\2/g
	s/("[^<]*)<\/keyword>([^"]+")/\1\2/g

	# Check if there are any more keyword tags left in this part of the string literal
	# if so, we branch back to removetags. We basically loop until this condition returns false.
	/"[^"]*<keyword>[^"]+"/b qremovetags
	
	s/"([^"]+)"/"<text>\1<\/text>"/g
}

: end

s/<([^\/][^>]+)>/<span class="\1">/g
s/<\/[^>]+>/<\/span>/g

: url

# Replace URL's {Text here}[URL here]
s/\{([^\}]*)\}\[([^\[]*)\]/<a href="\2" alt="\1">\1<\/a>/g

Updated to the latest version I had sitting in my subversion. Also added a license so that people who would like to use it are now able to do so without infringing upon my copyright!

Syntax highlighter for C++

X-Istence

*

LordOfLA

Godlike!

X-Istence

*

Geffy

OSNN Veteran Addict

X-Istence

*

X-Istence

*

osnnraptor

OSNN Newbie

X-Istence

*

Members online

Affiliates

Latest profile posts

Forum statistics