• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Syntax highlighter for C++

X-Istence

*
Political User
#1
Please see this post I posted for the most up-to-date sed script: http://forum.osnn.net/showpost.php?p=856765&postcount=8

--

I am a sick sick person, and I love regex's :p. I have written a rather quick and dirty C++ parser that does a few things (I am using this to create my Portfolio website). This is a sed script, which goes through the C++ files it is handed.

It also does some other cool things, for example I wanted to embed links in my C++ source files, so that on the web page those come out, what I figured I would do is take a page from the Markdown idea.

Code:
{Text here}[url here]
Allows this script to create links that are embedded in the C++ source code files. I figured I'd post it here.

Code:
#!/usr/bin/sed -E -f 

# Remove anything HTML is going to hate us for
s/</\&lt;/g
s/>/\&gt;/g

# Multiline comments

/\/\*\*/,/\*\*\//{

	# Replace the first instance with a span
	/\/\*\*/c\
	<span class="comment">/**

	# Close the span
	/\*\*\//c\
	**/</span>

	# We parse URL's in comments, but nothing else!
	b url
}

/\/\// {
	s/(\/\/ .*)/<span class=\"comment\">\1<\/span>/
	
	b url
}

# Pre-processor
/^#.*/ {
	# We add a span to syntax highlight it
	s/(#.*)/<span class=\"preproc\">\1<\/span>/
	
	# We want to make links out of the headers we have written
	s/#include \"(.*)\"/#include "<a href=\"\.\/\1tml\" alt=\"\1\">\1<\/a>"/
	
	# Only process URL's, nothing else!
	b url
}

# Replace text within quotes
s/\"([^"]*)\"/\"<span class=\"text\">\1<\/span>\"/g

# We syntax highlight the standard library stuff (not sure if I want to turn this on)

# s/(std::[^ (]+)/<span class=\"keyword\">\1<\/span>/g

# Types
s/(int)/<span class=\"keyword\">\1<\/span>/g
s/(char)/<span class=\"keyword\">\1<\/span>/g
s/(struct)/<span class=\"keyword\">\1<\/span>/g

# Keywords
s/(switch)/<span class=\"keyword\">\1<\/span>/g
s/(case)/<span class=\"keyword\">\1<\/span>/g
s/(default)/<span class=\"keyword\">\1<\/span>/g
s/(new)/<span class=\"keyword\">\1<\/span>/g
s/(delete)/<span class=\"keyword\">\1<\/span>/g
s/(typedef)/<span class=\"keyword\">\1<\/span>/g
s/(return)/<span class=\"keyword\">\1<\/span>/g
s/(public:)/<span class=\"keyword\">\1<\/span>/g
s/(private:)/<span class=\"keyword\">\1<\/span>/g
s/(protected:)/<span class=\"keyword\">\1<\/span>/g
s/(const)/<span class=\"keyword\">\1<\/span>/g
s/(friend)/<span class=\"keyword\">\1<\/span>/g

# This one is special, if we are not careful we also match class= from all the above <span>'s.
s/(class )/<span class=\"keyword\">class<\/span> /g

: url
# Replace URL's {Text here}[URL here]
s/\{([^\{]*)\}\[([^\[]*)\]/<a href=\"\2\" alt=\"\1\">\1<\/a>/g
 
Last edited:

X-Istence

*
Political User
#5
Code:
#!/usr/bin/sed -E -f 

# Remove anything HTML is going to hate us for
s/</\&lt;/g
s/>/\&gt;/g

# Multiline comments

/\/\*\*/,/\*\*\//{

	# Replace the first instance with a span
	/\/\*\*/c\
	<span class="comment">/**

	# Close the span
	/\*\*\//c\
	**/</span>

	# We parse URL's in comments, but nothing else!
	b url
}

/\/\// {
	s/(\/\/ .*)/<span class="comment">\1<\/span>/
	
	b url
}

# Pre-processor
/^#.*/ {
	# We add a span to syntax highlight it
	s/(#.*)/<span class="preproc">\1<\/span>/
	
	# We want to make links out of the headers we have written
	s/#include \"(.*)\"/#include "<a href="\.\/\1tml" alt="\1">\1<\/a>"/
	
	# Process no more
	b end
}

# Literal strings, we want to highlight them, but there is a catch
# in C++ we are allowed to start a literal string on one line, and then continue it on the next line
# this means we need to make sure we parse that correctly!

/"/ {

	# Are both the opening and closing quote on the same line? If so, branch to noloop
	/"[^"]*"/b noloop
	
	# No, they are apparently not. This means we replace the quote with the correct span tag
	s/"([^"]*)/"<span class="text">\1/
	
	# Label the loop
	: loop
	
	# Output the text to stdout, as normal
	n
	
	# Did we find another quote character yet? If not, we branch to label loop
	/"/!b loop
	
	# Ah, we found it. Replace it with the correct span tag.
	s/"/<\/span>"/
	
	# Parse only URL's in string literals.
	b url
	
	# Label the noloop branch
	: noloop
	
	# Search and replace the text
	s/"([^"]*)"/"<span class="text">\1<\/span>"/g
	
	# Parse only URL's in string literals.
	b url
}

# We syntax highlight the standard library stuff (not sure if I want to turn this on)

# s/(std::[^ (]+)/<span class=\"keyword\">\1<\/span>/g

# Types
s/(int)/<span class="keyword">\1<\/span>/g
s/(char)/<span class="keyword">\1<\/span>/g
s/(struct)/<span class="keyword">\1<\/span>/g

# Keywords
s/(switch)/<span class="keyword">\1<\/span>/g
s/(case)/<span class="keyword">\1<\/span>/g
s/(default)/<span class="keyword">\1<\/span>/g
s/(new)/<span class="keyword">\1<\/span>/g
s/(delete)/<span class="keyword">\1<\/span>/g
s/(typedef)/<span class="keyword">\1<\/span>/g
s/(return)/<span class="keyword">\1<\/span>/g
s/(const)/<span class="keyword">\1<\/span>/g
s/(friend)/<span class="keyword">\1<\/span>/g

s/(public:)/<span class="keyword">\1<\/span>/g
s/(private:)/<span class="keyword">\1<\/span>/g
s/(protected:)/<span class="keyword">\1<\/span>/g

# This one is special, if we are not careful we also match class= from all the above <span>'s.
s/(class )/<span class="keyword">class<\/span> /g

: url
# Replace URL's {Text here}[URL here]
s/\{([^\{]*)\}\[([^\[]*)\]/<a href="\2" alt="\1">\1<\/a>/g

: end
This version is a little bit more robust, as I added some code to deal with string literals. I know of one edge case that won't properly be parsed, does anyone see it? I will point out that it is with the string literal parsing.
 

X-Istence

*
Political User
#6
So, yeah ... that version had a few flaws in it, which I only noticed as I pushed on towards bigger and better things. Off course.

Code:
#!/usr/bin/sed -E -f 

# Remove anything HTML is going to hate us for
s/</\&lt;/g
s/>/\&gt;/g

# Multiline comments

/\/\*\*/,/\*\*\//{

	# Replace the first instance with a span
	/\/\*\*/c\
	<span class="comment">/**

	# Close the span
	/\*\*\//c\
	**/</span>

	# We parse URL's in comments, but nothing else!
	b url
}

/\/\// {
	s/(\/\/ .*)/<span class="comment">\1<\/span>/
	
	b url
}

# Pre-processor
/^#.*/ {
	# We add a span to syntax highlight it
	s/(#.*)/<span class="preproc">\1<\/span>/
	
	# We want to make links out of the headers we have written
	s/#include \"(.*)\"/#include "<a href="\.\/\1tml" alt="\1">\1<\/a>"/
	
	# Process no more
	b end
}

# We syntax highlight the standard library stuff (not sure if I want to turn this on)

# s/(std::[^ (]+)/<keyword>\1<\/keyword>/g

# Types
s/(int)/<keyword>\1<\/keyword>/g
s/(char)/<keyword>\1<\/keyword>/g
s/(struct)/<keyword>\1<\/keyword>/g

# Keywords
s/(switch)/<keyword>\1<\/keyword>/g
s/(case)/<keyword>\1<\/keyword>/g
s/(default)/<keyword>\1<\/keyword>/g
s/(new)/<keyword>\1<\/keyword>/g
s/(delete)/<keyword>\1<\/keyword>/g
s/(typedef)/<keyword>\1<\/keyword>/g
s/(return)/<keyword>\1<\/keyword>/g
s/(const)/<keyword>\1<\/keyword>/g
s/(friend)/<keyword>\1<\/keyword>/g

s/(public:)/<keyword>\1<\/keyword>/g
s/(private:)/<keyword>\1<\/keyword>/g
s/(protected:)/<keyword>\1<\/keyword>/g

# This one is special, if we are not careful we also match class= from all the above <span>'s.
s/(class )/<keyword>\1<\/keyword> /g

# Literal strings, we want to highlight them, but there is a catch
# in C++ we are allowed to start a literal string on one line, and then continue it on the next line
# this means we need to make sure we parse that correctly!

/"/ {

	# Are both the opening and closing quote on the same line? If so, branch to noloop
	/"[^"]*"/b noloop
	
	# No, they are apparently not. This means we replace the quote with the correct span tag
	s/"([^"]*)/<text>"\1/
	
	# Label the loop
	: loop
	
	# Append the next line from the input file to the current line, move cursor forward by one
	N
	
	# Did we find another quote character yet? If not, we branch to label loop
	/"[^"]+"/!b loop
	
	# Ah, we found it. Replace it with the correct span tag.
	s/("[^"]+)"/\1"<\/text>/
	
	b endquote
	
	# Label the noloop branch
	: noloop
	
	s/"([^"]*)"/<text>"\1"<\/text>/g
	
	: endquote
	
	# String literals should not contain "syntax" highlighted code. So we remove all keyword tags from them
	# Label removetags
	: removetags

	# Remove <keyword> and </keyword> from the source file
	s/("[^<]+)<keyword>([^"]+")/\1\2/g
	s/("[^<]+)<\/keyword>([^"]+")/\1\2/g

	# Check if there are any more keyword tags left in this part of the string literal
	# if so, we branch back to removetags. We basically loop until this condition returns false.
	/"[^"]+<keyword>[^"]+"/b removetags
}

: url
# Replace URL's {Text here}[URL here]
s/\{([^\}]*)\}\[([^\[]*)\]/<a href="\2" alt="\1">\1<\/a>/g

: end

# Replace <keyword> and </keyword> with their span equivalent
s/<keyword>/<span class="keyword">/g
s/<\/keyword>/<\/span>/g
s/<text>/<span class="text">/g
s/<\/text>/<\/span>/g
New and improved version!
 

X-Istence

*
Political User
#8
Code:
#!/usr/bin/sed -E -f 

###
 # Copyright (c) 2009 Bert JW Regeer <xistence@0x58.com>
 #
 # Permission to use, copy, modify, and distribute this software for any
 # purpose with or without fee is hereby granted, provided that the above
 # copyright notice and this permission notice appear in all copies.
 #
 # THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 # WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 # MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 # ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 # WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 # ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 # OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 #
##

# Remove anything HTML is going to hate us for
s/</\&lt;/g
s/>/\&gt;/g

# Pre-processor
/^#.*/ {
	# We add a span to syntax highlight it
	s/(#.*)/<span class="preproc">\1<\/span>/
	
	# We want to make links out of the headers we have written
	s/#include \"(.*)\"/#include "<a href="\.\/\1tml" alt="\1">\1<\/a>"/
	
	# For pre-processor directives we do no other processing what so ever!
	b
}

# We syntax highlight the standard library stuff (not sure if I want to turn this on)

# s/(std::[^ (]+)/<keyword>\1<\/keyword>/g

# Types
s/(int)/<keyword>\1<\/keyword>/g
s/(char)/<keyword>\1<\/keyword>/g
s/(struct)/<keyword>\1<\/keyword>/g

# Keywords
s/(switch)/<keyword>\1<\/keyword>/g
s/(case)/<keyword>\1<\/keyword>/g
s/(default)/<keyword>\1<\/keyword>/g
s/(new)/<keyword>\1<\/keyword>/g
s/(delete)/<keyword>\1<\/keyword>/g
s/(typedef)/<keyword>\1<\/keyword>/g
s/(return)/<keyword>\1<\/keyword>/g
s/(const)/<keyword>\1<\/keyword>/g
s/(friend)/<keyword>\1<\/keyword>/g

s/(public:)/<keyword>\1<\/keyword>/g
s/(private:)/<keyword>\1<\/keyword>/g
s/(protected:)/<keyword>\1<\/keyword>/g

# This one is special, if we are not careful we also match class= from all the above <span>'s.
s/(class )/<keyword>\1<\/keyword> /g

# Multiline comments
# Issues:
#
# It is valid C/C++ to do this:
#
# /* this is a comment */ myclass = new myclass(); /* comment again */
# this parse has one hell of a greedy regular expression, if you can figure out a way to make it not-greedy, you sir are a god
# back to the issue
# instead of turning that into the following <comment> <code> <comment> it becomes <comment>. Yes, the entire line
# is now a comment. That is bad. So don't use multiple comments on the same line, and you will be fine!


/\/\*/ {
	# Are both the opening and closing quote on the same line? If so, branch to noloop
	/\/\*([^\*][^\/]+)*\*\//b cnoloop
	
	# No, they are apparently not. This means we replace the quote with the correct span tag
	# s/(\/\*.*$)/<comment>\1/
	
	# Label the loop
	: cloop
	
	# Append the next line from the input file to the current line, move cursor forward by one
	N
	
	# Did we find another quote character yet? If not, we branch to label loop
	/\/\*.*\*\//!b cloop
	
	# Label the noloop branch
	: cnoloop
	
	s/(\/\*.*\*\/)/<comment>\1<\/comment>/g
	
	: endcomment

	: cremovetags
	
	s/((\/\*)[^<]+)<keyword>(.*\*\/)/\1\3/g
	s/((\/\*)[^<]+)<\/keyword>(.*\*\/)/\1\3/g
	
	/\/\*[^<]+<keyword>([^\*][^\/]+)*\*\//b cremovetags
	
	# We don't want to process literal strings
	
	b end
}

/\/\/ .*/ {
	s/(\/\/ .*)/<comment>\1<\/comment>/
	
	: cpremovetags

	s/(\/\/ [^<]*)<keyword>(.*)$/\1\2/g
	s/(\/\/ [^<]*)<\/keyword>(.*)$/\1\2/g
	
	/\/\/ [^<]*<keyword>.*$/b cpremovetags
	
	# We don't want to process literal strings
	
	b end
}

# Literal strings, we want to highlight them, but there is a catch
# in C++ we are allowed to start a literal string on one line, and then continue it on the next line
# this means we need to make sure we parse that correctly!

/"/ {
	# Are both the opening and closing quote on the same line? If so, branch to noloop
	/"[^"]*"/b qnoloop
	
	# Label the loop
	: qloop
	
	# Append the next line from the input file to the current line, move cursor forward by one
	N
	
	# Did we find another quote character yet? If not, we branch to label loop
	/"[^"]+"/!b qloop
	
	# Label the noloop branch
	: qnoloop
	
	# String literals should not contain "syntax" highlighted code. So we remove all keyword tags from them
	# Label removetags
	: qremovetags

	# Remove <keyword> and </keyword> from the source file
	s/("[^<]*)<keyword>([^"]+")/\1\2/g
	s/("[^<]*)<\/keyword>([^"]+")/\1\2/g

	# Check if there are any more keyword tags left in this part of the string literal
	# if so, we branch back to removetags. We basically loop until this condition returns false.
	/"[^"]*<keyword>[^"]+"/b qremovetags
	
	s/"([^"]+)"/"<text>\1<\/text>"/g
}

: end

s/<([^\/][^>]+)>/<span class="\1">/g
s/<\/[^>]+>/<\/span>/g

: url

# Replace URL's {Text here}[URL here]
s/\{([^\}]*)\}\[([^\[]*)\]/<a href="\2" alt="\1">\1<\/a>/g
Updated to the latest version I had sitting in my subversion. Also added a license so that people who would like to use it are now able to do so without infringing upon my copyright!
 

Members online

Latest posts

Latest profile posts

Perris Calderon wrote on Electronic Punk's profile.
Ep, glad to see you come back and tidy up...did want to ask a one day favor, I want to enhance my resume , was hoping you could make me administrator for a day, if so, take me right off since I won't be here to do anything, and don't know the slightest about the board, but it would be nice putting "served administrator osnn", if can do, THANKS

Been running around Quora lately, luv it there https://tinyurl.com/ycpxl
Electronic Punk wrote on Perris Calderon's profile.
All good still mate?
Hello, is there anybody in there? Just nod if you can hear me ...
Xie
What a long strange trip it's been. =)

Forum statistics

Threads
61,962
Messages
673,247
Members
89,017
Latest member
Seggar