Syntax highlighter for C++

X-Istence

*
Political Access
Joined
5 Dec 2001
Messages
6,498
Please see this post I posted for the most up-to-date sed script: http://forum.osnn.net/showpost.php?p=856765&postcount=8

--

I am a sick sick person, and I love regex's :p. I have written a rather quick and dirty C++ parser that does a few things (I am using this to create my Portfolio website). This is a sed script, which goes through the C++ files it is handed.

It also does some other cool things, for example I wanted to embed links in my C++ source files, so that on the web page those come out, what I figured I would do is take a page from the Markdown idea.

Code:
{Text here}[url here]

Allows this script to create links that are embedded in the C++ source code files. I figured I'd post it here.

Code:
#!/usr/bin/sed -E -f 

# Remove anything HTML is going to hate us for
s/</\&lt;/g
s/>/\&gt;/g

# Multiline comments

/\/\*\*/,/\*\*\//{

	# Replace the first instance with a span
	/\/\*\*/c\
	<span class="comment">/**

	# Close the span
	/\*\*\//c\
	**/</span>

	# We parse URL's in comments, but nothing else!
	b url
}

/\/\// {
	s/(\/\/ .*)/<span class=\"comment\">\1<\/span>/
	
	b url
}

# Pre-processor
/^#.*/ {
	# We add a span to syntax highlight it
	s/(#.*)/<span class=\"preproc\">\1<\/span>/
	
	# We want to make links out of the headers we have written
	s/#include \"(.*)\"/#include "<a href=\"\.\/\1tml\" alt=\"\1\">\1<\/a>"/
	
	# Only process URL's, nothing else!
	b url
}

# Replace text within quotes
s/\"([^"]*)\"/\"<span class=\"text\">\1<\/span>\"/g

# We syntax highlight the standard library stuff (not sure if I want to turn this on)

# s/(std::[^ (]+)/<span class=\"keyword\">\1<\/span>/g

# Types
s/(int)/<span class=\"keyword\">\1<\/span>/g
s/(char)/<span class=\"keyword\">\1<\/span>/g
s/(struct)/<span class=\"keyword\">\1<\/span>/g

# Keywords
s/(switch)/<span class=\"keyword\">\1<\/span>/g
s/(case)/<span class=\"keyword\">\1<\/span>/g
s/(default)/<span class=\"keyword\">\1<\/span>/g
s/(new)/<span class=\"keyword\">\1<\/span>/g
s/(delete)/<span class=\"keyword\">\1<\/span>/g
s/(typedef)/<span class=\"keyword\">\1<\/span>/g
s/(return)/<span class=\"keyword\">\1<\/span>/g
s/(public:)/<span class=\"keyword\">\1<\/span>/g
s/(private:)/<span class=\"keyword\">\1<\/span>/g
s/(protected:)/<span class=\"keyword\">\1<\/span>/g
s/(const)/<span class=\"keyword\">\1<\/span>/g
s/(friend)/<span class=\"keyword\">\1<\/span>/g

# This one is special, if we are not careful we also match class= from all the above <span>'s.
s/(class )/<span class=\"keyword\">class<\/span> /g

: url
# Replace URL's {Text here}[URL here]
s/\{([^\{]*)\}\[([^\[]*)\]/<a href=\"\2\" alt=\"\1\">\1<\/a>/g
 
Last edited:
thats evil :D

Care to explain the regexes for those of us not versred in them. Also those of us under the influence of alcohol :)
 
Code:
#!/usr/bin/sed -E -f 

# Remove anything HTML is going to hate us for
s/</\&lt;/g
s/>/\&gt;/g

# Multiline comments

/\/\*\*/,/\*\*\//{

	# Replace the first instance with a span
	/\/\*\*/c\
	<span class="comment">/**

	# Close the span
	/\*\*\//c\
	**/</span>

	# We parse URL's in comments, but nothing else!
	b url
}

/\/\// {
	s/(\/\/ .*)/<span class="comment">\1<\/span>/
	
	b url
}

# Pre-processor
/^#.*/ {
	# We add a span to syntax highlight it
	s/(#.*)/<span class="preproc">\1<\/span>/
	
	# We want to make links out of the headers we have written
	s/#include \"(.*)\"/#include "<a href="\.\/\1tml" alt="\1">\1<\/a>"/
	
	# Process no more
	b end
}

# Literal strings, we want to highlight them, but there is a catch
# in C++ we are allowed to start a literal string on one line, and then continue it on the next line
# this means we need to make sure we parse that correctly!

/"/ {

	# Are both the opening and closing quote on the same line? If so, branch to noloop
	/"[^"]*"/b noloop
	
	# No, they are apparently not. This means we replace the quote with the correct span tag
	s/"([^"]*)/"<span class="text">\1/
	
	# Label the loop
	: loop
	
	# Output the text to stdout, as normal
	n
	
	# Did we find another quote character yet? If not, we branch to label loop
	/"/!b loop
	
	# Ah, we found it. Replace it with the correct span tag.
	s/"/<\/span>"/
	
	# Parse only URL's in string literals.
	b url
	
	# Label the noloop branch
	: noloop
	
	# Search and replace the text
	s/"([^"]*)"/"<span class="text">\1<\/span>"/g
	
	# Parse only URL's in string literals.
	b url
}

# We syntax highlight the standard library stuff (not sure if I want to turn this on)

# s/(std::[^ (]+)/<span class=\"keyword\">\1<\/span>/g

# Types
s/(int)/<span class="keyword">\1<\/span>/g
s/(char)/<span class="keyword">\1<\/span>/g
s/(struct)/<span class="keyword">\1<\/span>/g

# Keywords
s/(switch)/<span class="keyword">\1<\/span>/g
s/(case)/<span class="keyword">\1<\/span>/g
s/(default)/<span class="keyword">\1<\/span>/g
s/(new)/<span class="keyword">\1<\/span>/g
s/(delete)/<span class="keyword">\1<\/span>/g
s/(typedef)/<span class="keyword">\1<\/span>/g
s/(return)/<span class="keyword">\1<\/span>/g
s/(const)/<span class="keyword">\1<\/span>/g
s/(friend)/<span class="keyword">\1<\/span>/g

s/(public:)/<span class="keyword">\1<\/span>/g
s/(private:)/<span class="keyword">\1<\/span>/g
s/(protected:)/<span class="keyword">\1<\/span>/g

# This one is special, if we are not careful we also match class= from all the above <span>'s.
s/(class )/<span class="keyword">class<\/span> /g

: url
# Replace URL's {Text here}[URL here]
s/\{([^\{]*)\}\[([^\[]*)\]/<a href="\2" alt="\1">\1<\/a>/g

: end

This version is a little bit more robust, as I added some code to deal with string literals. I know of one edge case that won't properly be parsed, does anyone see it? I will point out that it is with the string literal parsing.
 
So, yeah ... that version had a few flaws in it, which I only noticed as I pushed on towards bigger and better things. Off course.

Code:
#!/usr/bin/sed -E -f 

# Remove anything HTML is going to hate us for
s/</\&lt;/g
s/>/\&gt;/g

# Multiline comments

/\/\*\*/,/\*\*\//{

	# Replace the first instance with a span
	/\/\*\*/c\
	<span class="comment">/**

	# Close the span
	/\*\*\//c\
	**/</span>

	# We parse URL's in comments, but nothing else!
	b url
}

/\/\// {
	s/(\/\/ .*)/<span class="comment">\1<\/span>/
	
	b url
}

# Pre-processor
/^#.*/ {
	# We add a span to syntax highlight it
	s/(#.*)/<span class="preproc">\1<\/span>/
	
	# We want to make links out of the headers we have written
	s/#include \"(.*)\"/#include "<a href="\.\/\1tml" alt="\1">\1<\/a>"/
	
	# Process no more
	b end
}

# We syntax highlight the standard library stuff (not sure if I want to turn this on)

# s/(std::[^ (]+)/<keyword>\1<\/keyword>/g

# Types
s/(int)/<keyword>\1<\/keyword>/g
s/(char)/<keyword>\1<\/keyword>/g
s/(struct)/<keyword>\1<\/keyword>/g

# Keywords
s/(switch)/<keyword>\1<\/keyword>/g
s/(case)/<keyword>\1<\/keyword>/g
s/(default)/<keyword>\1<\/keyword>/g
s/(new)/<keyword>\1<\/keyword>/g
s/(delete)/<keyword>\1<\/keyword>/g
s/(typedef)/<keyword>\1<\/keyword>/g
s/(return)/<keyword>\1<\/keyword>/g
s/(const)/<keyword>\1<\/keyword>/g
s/(friend)/<keyword>\1<\/keyword>/g

s/(public:)/<keyword>\1<\/keyword>/g
s/(private:)/<keyword>\1<\/keyword>/g
s/(protected:)/<keyword>\1<\/keyword>/g

# This one is special, if we are not careful we also match class= from all the above <span>'s.
s/(class )/<keyword>\1<\/keyword> /g

# Literal strings, we want to highlight them, but there is a catch
# in C++ we are allowed to start a literal string on one line, and then continue it on the next line
# this means we need to make sure we parse that correctly!

/"/ {

	# Are both the opening and closing quote on the same line? If so, branch to noloop
	/"[^"]*"/b noloop
	
	# No, they are apparently not. This means we replace the quote with the correct span tag
	s/"([^"]*)/<text>"\1/
	
	# Label the loop
	: loop
	
	# Append the next line from the input file to the current line, move cursor forward by one
	N
	
	# Did we find another quote character yet? If not, we branch to label loop
	/"[^"]+"/!b loop
	
	# Ah, we found it. Replace it with the correct span tag.
	s/("[^"]+)"/\1"<\/text>/
	
	b endquote
	
	# Label the noloop branch
	: noloop
	
	s/"([^"]*)"/<text>"\1"<\/text>/g
	
	: endquote
	
	# String literals should not contain "syntax" highlighted code. So we remove all keyword tags from them
	# Label removetags
	: removetags

	# Remove <keyword> and </keyword> from the source file
	s/("[^<]+)<keyword>([^"]+")/\1\2/g
	s/("[^<]+)<\/keyword>([^"]+")/\1\2/g

	# Check if there are any more keyword tags left in this part of the string literal
	# if so, we branch back to removetags. We basically loop until this condition returns false.
	/"[^"]+<keyword>[^"]+"/b removetags
}

: url
# Replace URL's {Text here}[URL here]
s/\{([^\}]*)\}\[([^\[]*)\]/<a href="\2" alt="\1">\1<\/a>/g

: end

# Replace <keyword> and </keyword> with their span equivalent
s/<keyword>/<span class="keyword">/g
s/<\/keyword>/<\/span>/g
s/<text>/<span class="text">/g
s/<\/text>/<\/span>/g

New and improved version!
 
Code:
#!/usr/bin/sed -E -f 

###
 # Copyright (c) 2009 Bert JW Regeer <xistence@0x58.com>
 #
 # Permission to use, copy, modify, and distribute this software for any
 # purpose with or without fee is hereby granted, provided that the above
 # copyright notice and this permission notice appear in all copies.
 #
 # THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 # WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 # MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 # ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 # WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 # ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 # OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 #
##

# Remove anything HTML is going to hate us for
s/</\&lt;/g
s/>/\&gt;/g

# Pre-processor
/^#.*/ {
	# We add a span to syntax highlight it
	s/(#.*)/<span class="preproc">\1<\/span>/
	
	# We want to make links out of the headers we have written
	s/#include \"(.*)\"/#include "<a href="\.\/\1tml" alt="\1">\1<\/a>"/
	
	# For pre-processor directives we do no other processing what so ever!
	b
}

# We syntax highlight the standard library stuff (not sure if I want to turn this on)

# s/(std::[^ (]+)/<keyword>\1<\/keyword>/g

# Types
s/(int)/<keyword>\1<\/keyword>/g
s/(char)/<keyword>\1<\/keyword>/g
s/(struct)/<keyword>\1<\/keyword>/g

# Keywords
s/(switch)/<keyword>\1<\/keyword>/g
s/(case)/<keyword>\1<\/keyword>/g
s/(default)/<keyword>\1<\/keyword>/g
s/(new)/<keyword>\1<\/keyword>/g
s/(delete)/<keyword>\1<\/keyword>/g
s/(typedef)/<keyword>\1<\/keyword>/g
s/(return)/<keyword>\1<\/keyword>/g
s/(const)/<keyword>\1<\/keyword>/g
s/(friend)/<keyword>\1<\/keyword>/g

s/(public:)/<keyword>\1<\/keyword>/g
s/(private:)/<keyword>\1<\/keyword>/g
s/(protected:)/<keyword>\1<\/keyword>/g

# This one is special, if we are not careful we also match class= from all the above <span>'s.
s/(class )/<keyword>\1<\/keyword> /g

# Multiline comments
# Issues:
#
# It is valid C/C++ to do this:
#
# /* this is a comment */ myclass = new myclass(); /* comment again */
# this parse has one hell of a greedy regular expression, if you can figure out a way to make it not-greedy, you sir are a god
# back to the issue
# instead of turning that into the following <comment> <code> <comment> it becomes <comment>. Yes, the entire line
# is now a comment. That is bad. So don't use multiple comments on the same line, and you will be fine!


/\/\*/ {
	# Are both the opening and closing quote on the same line? If so, branch to noloop
	/\/\*([^\*][^\/]+)*\*\//b cnoloop
	
	# No, they are apparently not. This means we replace the quote with the correct span tag
	# s/(\/\*.*$)/<comment>\1/
	
	# Label the loop
	: cloop
	
	# Append the next line from the input file to the current line, move cursor forward by one
	N
	
	# Did we find another quote character yet? If not, we branch to label loop
	/\/\*.*\*\//!b cloop
	
	# Label the noloop branch
	: cnoloop
	
	s/(\/\*.*\*\/)/<comment>\1<\/comment>/g
	
	: endcomment

	: cremovetags
	
	s/((\/\*)[^<]+)<keyword>(.*\*\/)/\1\3/g
	s/((\/\*)[^<]+)<\/keyword>(.*\*\/)/\1\3/g
	
	/\/\*[^<]+<keyword>([^\*][^\/]+)*\*\//b cremovetags
	
	# We don't want to process literal strings
	
	b end
}

/\/\/ .*/ {
	s/(\/\/ .*)/<comment>\1<\/comment>/
	
	: cpremovetags

	s/(\/\/ [^<]*)<keyword>(.*)$/\1\2/g
	s/(\/\/ [^<]*)<\/keyword>(.*)$/\1\2/g
	
	/\/\/ [^<]*<keyword>.*$/b cpremovetags
	
	# We don't want to process literal strings
	
	b end
}

# Literal strings, we want to highlight them, but there is a catch
# in C++ we are allowed to start a literal string on one line, and then continue it on the next line
# this means we need to make sure we parse that correctly!

/"/ {
	# Are both the opening and closing quote on the same line? If so, branch to noloop
	/"[^"]*"/b qnoloop
	
	# Label the loop
	: qloop
	
	# Append the next line from the input file to the current line, move cursor forward by one
	N
	
	# Did we find another quote character yet? If not, we branch to label loop
	/"[^"]+"/!b qloop
	
	# Label the noloop branch
	: qnoloop
	
	# String literals should not contain "syntax" highlighted code. So we remove all keyword tags from them
	# Label removetags
	: qremovetags

	# Remove <keyword> and </keyword> from the source file
	s/("[^<]*)<keyword>([^"]+")/\1\2/g
	s/("[^<]*)<\/keyword>([^"]+")/\1\2/g

	# Check if there are any more keyword tags left in this part of the string literal
	# if so, we branch back to removetags. We basically loop until this condition returns false.
	/"[^"]*<keyword>[^"]+"/b qremovetags
	
	s/"([^"]+)"/"<text>\1<\/text>"/g
}

: end

s/<([^\/][^>]+)>/<span class="\1">/g
s/<\/[^>]+>/<\/span>/g

: url

# Replace URL's {Text here}[URL here]
s/\{([^\}]*)\}\[([^\[]*)\]/<a href="\2" alt="\1">\1<\/a>/g

Updated to the latest version I had sitting in my subversion. Also added a license so that people who would like to use it are now able to do so without infringing upon my copyright!
 

Members online

No members online now.

Latest profile posts

Also Hi EP and people. I found this place again while looking through a oooollllllldddd backup. I have filled over 10TB and was looking at my collection of antiques. Any bids on the 500Mhz Win 95 fix?
Any of the SP crew still out there?
Xie wrote on Electronic Punk's profile.
Impressed you have kept this alive this long EP! So many sites have come and gone. :(

Just did some crude math and I apparently joined almost 18yrs ago, how is that possible???
hello peeps... is been some time since i last came here.
Electronic Punk wrote on Sazar's profile.
Rest in peace my friend, been trying to find you and finally did in the worst way imaginable.

Forum statistics

Threads
62,015
Messages
673,494
Members
5,621
Latest member
naeemsafi
Back