.NET Discussion

.NET Issues, Problems, Code Samples, and Fixes

Visual Studio/VB.NET: How To Easily Document Your Code

If you’re a routine Visual Studio user like me, I don’t need to tell you how awesome Intellisense is. Not only would some of us be lost without it, but it also helps us be way more efficient programmers either through simple selection of methods or properties or by discovering new object members that maybe we didn’t know about previously. Additionally, one of the main benefits of Intellisense is that it tells you  about the item in question, for instance, that the String.IsNullOrEmpty() function “Indicates whether the specified System.String object is null or an System.String.Empty string.”

Visual Studio Intellisense

When writing my own objects, however, I used to find myself yearning for this kind of help for my own functions. Wouldn’t it be great to get Intellisense to tell me what that “GetUserInfo” function I wrote five weeks ago does rather than me having to go look it up? What about what those parameter names mean? Luckily, there is a way, and it is super easy.

For example, let’s say you have an object that returns its own permalink in a shared function called GetHTMLPermaLink(). To document it, simply place your cursor above the function and press the apostrophe key three times: '''. Automagically, the following pops up:

Function Documentation

All you have to do is fill in the blanks and viola, you have documented code! (NOTE: in C#, I believe the syntax is /// but I’m not sure.) To see this in action, after you fill in the appropriate information, go try to pull up your function somewhere and watch the magic:

Visual Studio Intellisense

More information on Visual Studio Code Documentation (C#)

Happy documenting!

Advertisements

May 20, 2009 Posted by | Tips & Tricks, VB.NET, Visual Studio.NET | , , | 5 Comments

My Foray into Regular Expressions with ASP.NET

I know it’s been a while since I posted here. I’ve been working pretty diligently on my newest creation, Quotidian Word, which will essentially be a “word-a-day” site that encourages users to actually learn the word and use it actively in everyday speech or writing. Right now it’s only an email harvester so that I can send an email to those interested when it’s ready, but I’m nearing the launch point every day 🙂

Anyways, the point of this post is to discuss my dip into the world of Regular Expressions. I needed to come up with a way to find the word I wanted in a sentence. Easy enough, right? Just mySentence.Contains("myword") and there you go. It would be nice if that’s all I had to do, but English is a funny language. Every word can assume many forms. For instance, if I want to find the word “entry” in a sentence, I also would like to find “entries”. Or more simply, if i’m looking for “dog” I also want to find “dogs”. Using the mySentence.Contains() method, I would not find “entries” and I would find only the “dog” in “dogs”.

Enter regular expressions.

I had previously shied away from them because when one looks at a regular expression (aka “regex”), it can be rather intimidating. Take this stock regular expression that comes with Visual Studio as a default for finding an email address:

\w+([-+.']\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*

Your reaction may be the same as mine was: WTF. But when you try to enter an email in a textbox validated by this regex, it knows if you are or not. So my problem was still, how do I find all forms of any word I choose? I took a couple factors into mind, such as, I will mostly be using more obscure words with less common roots, so that will help a bit, and I won’t be using too many really short words that can be blended into other words in a sentence.

My first thought was ok, how about I first try to find the whole word, then the whole word plus a suffix, then the whole word minus a letter plus a suffix, then a the whole word minus two letters plus a suffix:

\bentry\b|((\bentry|\bentr|\bent)(s|es|ies)\b)

I used the word “entry” as an example here. “\b” means either the beginning or the end of a word and “|” means “or”. The rest is just separated by parens. This worked out ok for a while until I realized that the English language has something like a thousand suffixes. I knew regexes were more powerful than that. There had to be an easier way.

And there was! I was sort of on the right track with the losing of the last two letters. In English, most words either simply append a suffix (dog -> dogs), drop one letter and append a suffix (happy -> happiness) or drop two letters and append a suffix (cactus -> cacti). Aside from the word “person” (person -> people) I could not think of an instance where a word dropped three or more and would profide me with enough remaining information to actually distinguish it from other words in the sentence. If you can, let me know.

So after a bit more research, and the testing from a handy regex tool called Expresso, my regex eventually evolved into: (?:ent(?:r|ry)?).+?\b

Explanation: “ent” is what I called the “root” of the word, essentially the letters remaining after the last two have been stripped off. “(?:” is a grouping that means “find whatever’s here, but don’t actually match it alone”. This way it doesn’t find only “ent” or “r” or “ry” and match it, but rather matches the whole thing all together. The clause “(?:r|ry)?” means find “r” OR “ry” (in addition to what comes before it, so “entr” or “entry”). The “?” at the end means the whole clause before it is optional, meaning if it’s not there, it’s ok. The “.+” means find any character after the previous clause for as many repetitions as you can, so for instance, if the word in the sentence is “entries” it will first find the “ent” then the “r” then any characters that follow “ies” up until the end of the word, “\b”. The “?” at the end of the “.+” just means take as few characters as possible up to the next clause, which is the “\b”.

Whew! That’s a lot. I found that this found about 99% of the words that I would be using in all their various forms. But then I got to thinking, what about prefixes? What if someone used something like “anti” or “pre” or something in front of a word to change it just slightly? Hence, my (nearly) final product:

(?:(?:\b\w+)?{1}(?:{2}|{3})?).+?\b

where {1} is the word minus the last two letters, {2} is the penultimate letter, and {3} is the last two letters. The optional clause at the beginning takes care of any prefix if it happens to exist.

Great! All done. It can pull it out of a string, no problem. Now if someone enters the word in a sentence in my textbox it will find…it… crap. It doesn’t work as is with the RegexValidator in ASP.NET on textboxes. Why? Because the validator is looking at the whole string in the textbox to see if it fits the regular expression. For instance, the email regular expression assumes that whatever you enter into that textbox is going to be an email, nothing else. If you enter a sentence into a textbox, it assumes that whatever you enter into that textbox is going to fit the regex.

In order to counter this problem, I put in a simple fix: I prepended and appended my regex for textboxes with “.*”, which means that it will find any characters before or after the word we’re looking for. Done! … Right?

So I thought, until I realized that when people enter sentences, usually they do so in multiline textboxes, as they’re easier to see everything you’re doing. This regex works until the user hits the “Enter” key and inserts a new line. After much research and hair pulling, I eventually found the solution:

^(.|\r|\n)*(?:(?:\b\w+)?{1}(?:{2}|{3})?).+?\b(.|\r|\n)*$

with {1},{2}, and {3} meaning the same thing. The interesting thing about the “.” character in regexes is it means “match any character…..except new lines and carriage returns”, which means if you’re using multiline textboxes, you have to account for that. So I had to prepend “^(.|\r|\n)*” and append “(.|\r|\n)*$” to my already unwieldy regex. “\r” is a carriage return and “\n” is a new line. The “^” symbol means start at the beginning of the string, and the “$” means continue to the end of the string, with the “*” symbols meaning repeat as often as necessary.

To finalize the product, I simply plugged all of this information into a shared function that returns the proper regex (based on a parameter that determines if I’m using it on a regular string, a textbox, or a multiline textbox) and set my validator’s ValidationExpression at runtime based on the word I was looking for. It actually works pretty well… so far…. 🙂

All in all, I think I like regexes. They are undoubtedly powerful and can save many programming hours if you know how to use them. I know the initial function I was using to find words in a sentence took me maybe two hours to write and it didn’t even work all the time. It was also about 200ish lines of code. My new function using regexes is 17 lines of actual code, and 7 of that is a Select statement for string/textbox/multiline textbox.

While the initial learning curve is quite steep for regexes, if you’re serious about programming, I highly suggest that you take a day or two to learn them, as that day or two investment could save you weeks or months down the line of your programming career.

Some regex resources:

What’s your craziest regex?

May 15, 2009 Posted by | ASP.NET, Javascript, Regex, VB.NET, Visual Studio.NET | 1 Comment