Ask a Jedi: Best way to trim text

Sal asks:

just curious what's the best way (or how you handle) to truncate a paragraph to only show say perhaps 500 chars.? I have a newsletter that I'm emailing out, and I only wanna show 500 chars. of each article in the email.

Ah, I love it when folks ask me the "best" way to do things since no matter what I say, I'm not wrong (grin). Seriously though - here are multiple ways to trim text.

Let's first start off with a block of text that we will use for our tests:

<cfsavecontent variable="quote">
The Constitution is not an instrument for the government to restrain the people, it is an instrument for the people to restrain the government -- lest it come to dominate our lives and interests. Patrick Henry.
</cfsavecontent>

So the quickest way to trim text is with left:

<cfoutput>#left(quote,100)#</cfoutput>

However if you use this on the text, you get:

The Constitution is not an instrument for the government to restrain the people, it is an instrumen

As you can see, the last word in the trimmed text, instrument, was cut off before the final t. This isn't a horrible thing of course, but it could be done better. ColdFusion does ship with a Wrap function, but that won't crop the text, it will simply break the text into lines of a certain length. It will break the text nicely though, so why not use list functions?

<cfoutput>#listFirst(wrap(quote,100),chr(10))#</cfoutput>

This returns a nicer trim:

The Constitution is not an instrument for the government to restrain the people, it is an

This works nicely, but I kinda feel 'dirty' doing it like this, so why not see if a UDF exists for this? Turns out one does: FullLeft. This UDF lets me do this instead:

<cfoutput>#fullleft(quote,100)#</cfoutput>

In theory it's doing a lot less work than wrap so it should be quicker.

Ok, so we're done, right? Well, what if we modify the quote a bit:

<cfsavecontent variable="quote">
The <a href="http://www.coldfusionjedi.com">Constitution</a> is <b>not</b> an instrument for the government to restrain the people, it is an instrument for the people to restrain the government -- lest it come to dominate our lives and interests. Patrick Henry.
</cfsavecontent>

As you can see I've added some HTML to the text. This HTML messes up my count. If I wanted to show 100 characters, I don't think I'd want HTML to count at all. In fact, I probably don't want to show HTML at all. I can fix that easily enough:

<cfset quote = rereplace(quote, "<.*?>", "", "all")>

Another issue is space. Now this is a contrived example, but it could happen in a live system:

<cfsavecontent variable="quote">
The <a href="http://www.coldfusionjedi.com">Constitution</a> is <b>not</b>










an
instrument for the government to restrain the people, it is an instrument for
the people to restrain the government -- lest it come to dominate our lives and interests.

Patrick Henry.
</cfsavecontent>

You can use another regex to handle this:

<cfset quote = rereplace(quote, "[[:space:]]+", " ", "all")>

Or conversely, if you use the wrap() function, it takes a 3rd argument to strip out existing line breaks and carriage returns.

Lastly - it sometimes helps to visually flag text that has been trimmed. Normally this is done with a "...". You can mimic this affect like so:

<cfif len(quote) gt 100>
   <cfset trimmedQuote = fullLeft(quote, 100)>
   <cfset trimmedQuote &= "...">
<cfelse>
   <cfset trimmedQuote = quote>
</cfif>
<cfoutput>#trimmedQuote#</cfoutput>

I just check the length of the original quote and conditionally perform a trim and add the "...".

Comments

nice, detailed post, might come in handy soon. thanks ray!
# Posted By Chris H | 5/28/08 9:11 AM
Wow. I've had to do this many times but have never put that much thought into it. Thanks for the great solution.
# Posted By David S | 5/28/08 9:59 AM
could be better off trimming it at source to avoid unnecessary db traffic.

in mssql select the column with something like:

substring(yourTextyCol,1,100)

then stick your "..." after it
# Posted By Luke | 5/28/08 10:01 AM
@Luke - Well this suffers the same problem as Left() does. However, you do have a point - it may make sense to do the 'nice left' once and store the result.
# Posted By Raymond Camden | 5/28/08 10:06 AM
thanks yo!

;-)
# Posted By sal | 5/28/08 10:17 AM
I never thought of using the Wrap tag to do this - that's neat. My custom function uses Find and Left:

plaintext = ReReplaceNoCase(htmltext, "<[^>]+>"), " ", "all");
Return Left(plaintext , Find(" ", plaintext, 100)) & "&hellip;";
# Posted By John Whish | 5/28/08 11:43 AM
Another thing to keep in mind is HTML entities. You may wish to convert the entities to ASCII text to get a better character or word count. So if your user entered something like this:

Using characters like “é”, “ü”, “etc”. is ok.

...would be converted to this...

Using characters like “é”, “ü”, “etc”. is ok.

I ran into this problem awhile back and created a nice little JavaScript function to do this, but it could easily be done in ColdFusion as well.
# Posted By Doug | 5/28/08 1:55 PM
oops, that first line got converted. It should have been:

Using characters like &ldquo;&eacute;&rdquo;, &ldquo;&uuml;&rdquo;, &ldquo;etc&rdquo;. is ok.
# Posted By Doug | 5/28/08 1:56 PM
Excellent point Doug. It may even be worthwhile to just delete them. Now that may result in some odd misspellings - but it may be the simplest solution.
# Posted By Raymond Camden | 5/28/08 2:01 PM
yeah, if you dont have to worry about HTML entities, special characters etc., you could do this in MySQL via
SELECT CONCAT( LEFT( TextToSelect, 500 ), '...' ) FROM Blah
# Posted By Chris H | 5/28/08 2:29 PM
Awesome post, with consideration of the HTML. Nice.
# Posted By Joshua Curtiss | 5/28/08 2:42 PM
What if the text contains HTML-tags like <b>, <i>, <a> etc.

I have had trouble wrapping text containing these kind of tags. The problem is when it cuts the text between a start tag and an end tag.
# Posted By Mikkel Johansen | 5/29/08 12:21 AM
@Mikkel: Um.... you did read the blog entry, right? I cover HTML.
# Posted By Raymond Camden | 5/29/08 5:41 AM
@Ray: I did read the part where you replace any tag with "blank".

My "question" should have been: What if I want to keep the html-tags without breaking the start/end-tag when wrapping the text.
# Posted By Mikkel Johansen | 5/29/08 6:08 AM
Ah - that gets significantly more complex perhaps. You could do this:

1) Remove html
2) Find FullLeft(N)
3) If fullLeft(n) ends at "the", go back to original content (with html), find "the", and end there.

That would let you keep the html and wrap at text not including html, but the N value would be <N as you didn't count the html. Another issue is that it wouldn't stop you from ending with <b>the and having an unmatched tag.

You could write code to determine if your fullleft(n) result is inside HTML. This is done by looking for <X> </X> around your result. If you find it, you either move to the end of </x> or go to before <x>.
# Posted By Raymond Camden | 5/29/08 6:15 AM
@Mikkel: My "question" should have been: What if I want to keep the html-tags without breaking the start/end-tag when wrapping the text.

You would almost need to create some sort of HTML parser for that. Have you ever looked at the HTML source for a ColdFusion error message? If you notice, it adds a bunch of close tags (</b></p></td></tr></table>...) before it adds the Error message source. It's not calculating those tags, it's just adding a bunch of them to be safe and they don't always work.

Most likely you could create a Regular Express to find all the <BLOCK> tags, and if any of them were still open, you could add their closing tags to the end. I think that would be crazy complicated and would have to ask if it's worth it.
# Posted By Doug | 5/29/08 2:22 PM
I used this solution that Ben Nadel came up with to close truncated html. It does a pretty good job.

http://www.bennadel.com/blog/982-Ask-Ben-Closing-X...
# Posted By anthony | 6/2/08 4:23 PM
Is it possible to trim all text around a tag. For instance trim all the text in your example before or after "<a href="http://www.coldfusionjedi.com">Constitutio...; ?
# Posted By Duane Hardy | 6/11/08 8:49 AM
In theory. You would write a regex to match

(1 or more spaces)(link including closing a tag)(1 or more spaces)

and replace with

(link)

Let me gtive it a try.
# Posted By Raymond Camden | 6/11/08 8:50 AM
Not heavily tested, but this seems to work. I assumed you meant replace two or more with one:

<cfset text = rereplacenocase(text, "[[:space:]]+(<a.*?>.*?</a>)[[:space:]]+"," \1 ")>

If you really want NO space, period, just change the 3rd arg to be just \1, not (space)\1(space).
# Posted By Raymond Camden | 6/11/08 8:53 AM
What I am ultimately trying to do is add 'target="_blank"' to an a tag with an external href. I was looking at trying to trim a string provided by a webservice down to just the <a> tag and using javascript for all external links. Possibly I could do add 'target="blank"' with coldfusion? Do you know any methods?
# Posted By Duane Hardy | 6/11/08 9:01 AM
Oh thats simpler. You can't do it (afaik) in one line, but just get all the links (use reMatch in cf8) and then replace any non-local link with the modified version.
# Posted By Raymond Camden | 6/11/08 9:09 AM
I know it's not ColdFusion, but you could use jQuery to do this for you quite easily (assuming all external links start with http):

$('a[@href^="http://";]').attr("target", "_blank");

Or if you want to get fancy:

$('a[@href^="http://";]').attr({target: "_blank", title: "Opens in a new window"});

Hope that's of interest.
# Posted By John Whish | 6/11/08 3:21 PM
@John: Very much of interest. To quote the great Paris: "That's hot."
# Posted By Raymond Camden | 6/11/08 3:38 PM
It's nice to teach you something Raymond after all I've learnt from you :)
# Posted By John Whish | 6/11/08 3:46 PM
Just noticed my comment didn't come out right. There shouldn't be a semi-colon after the http. I'll try posting again in case it was my typo!

$('a[@href^="http://";]').attr({target: "_blank", title: "Opens in a new window"});
# Posted By John Whish | 6/11/08 3:54 PM
I've posted the code here, (with a bonus feature!) if anyone's interested
http://www.aliaspooryorik.com/blog/index.cfm/e/pos...

:)
# Posted By John Whish | 6/11/08 4:16 PM
My front end is a flex application. I thought if I did the modification on the backend before the links got called that it would save time and coding on the front end.

I assume I would have to have an ExternalInterface in the flex actionscript to communicate with the jQuery code? It would be great if it would automatically detect and append the code.

I do have to append a user code to the end of each external link, so I am interested in how jQuery works. I haven't started this part of the project yet, where is the best source for this resource?

Thanks for your help.
# Posted By Duane Hardy | 6/12/08 8:50 AM