Text Cleaning

Store

Stacks Image 878
CleanHaven Releases

Available Now CleanHaven 2.5 — 19 October 2012
[New] You can now remove all email addresses, or as an option, remove all text except the email addresses.
[New] You can now view the invisible characters.
[New] You can now spell check words based on every dictionary. It returns Correct if the word is in one or more dictionary and Incorrect if the word is not in any of the available dictionaries.
[New] You can now run the application at Full Screen on MacOS X.
[New] You can now return a list of all the Dictionary languages that each word is listed in.
[New] You can now remove excess tabs.
[New] You can now remove all ‘http://…’ and ‘www…’ web addresses, or as an option, remove all text except the web addresses.
[New] You can now lookup geolocation data based on Apple’s Core Location, Google and Yahoo APIs
[New] You can now find the number of days from a date until Christmas.
[New] You can now Encrypt and Decrypt your text using a PassKey and Salt text.
[New] You can now convert words to Alphagrams (the letters of each word are sorted) by word or by paragraph.
[New] You can now convert paragraphs to Palindromes (every letter is reversed).
[New] You can now convert diacritical text to its non-diacritical counterparts e.g. ç becomes c (lowercase) or É becomes E (uppercase).
[New] You can now choose to convert to Anagrams by word or by paragraph.
[New] You can now check the spelling based on an expanded number of built-in dictionaries.
[New] You can now calculate the number of work days between two dates (excludes Saturdays and Sundays)
[New] You can now calculate the number of weekends between two dates (includes only Saturdays and Sundays)
[New] You can now calculate the number of any particular day between two dates (i.e. ‘Mondays’ through ‘Sundays’)
[New] When exporting the text to an SQLite database, TEXT fields are now automatically set to COLLATE NOCASE rather than the default COLLATE BINARY. This means that text searches are no longer case-sensitive for ASCII characters so searching for ‘steve’ will match ‘Steve’.
[New] The dictionary now includes the option to search for swear words within the text.
[New] The Dictionaries now support accented characters.
[New] The Dictionaries are now much more extensive and support the following languages: Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, SwearWords and Swedish
[New] The application is now a Cocoa application, giving you contextual options.
[New] Now saves your Preferences and font size between saves.
[New] Compatible with Retina Display.
[New] Compatible with MacOS X Mountain Lion.
[New] Added the ability to sort words within a paragraph (as suggested by Rainer Baersdorf)
[New] You can now remove all Page Numbers (as suggested by Michael Kristensen).
[Fix] When there are no options available the Options window description takes up the whole window.
[Fix] The UK/USA Postcode to Region conversion now allows you to pick the individual item to return (All Data, City, Country, Latitude/Longitude, Latitude, Longitude, Region, State or SubRegion).
[Fix] The Styled text and Table functions are no longer supported.
[Fix] The HTML Name, HTML Number and URL encoding conversions are now much more reliable.
[Fix] The case of the First Name no longer matters when looking up the Salutation.
[Fix] Options window doesn’t show unless there are entries available.
[Fix] Finding the number of days left in the year now works for months with less than 31 days.
[Fix] Converting Anagrams no longer freezes when converting multiple character words of the same letter.

CleanHaven 2.4 — 21 October 2011
[New] You can now wrap the selected text with an HTML expression e.g.
[New] You can now wrap Text after a set number of characters
[New] You can now view an Options window showing the latest clicked Clean or Save method with a Description and Example of the Clean.
[New] You can now Transpose columns and rows, so Rows become Columns and Columns become Rows.
[New] You can now see the text differences are between two similar sets of text.
[New] You can now Save the Styled Text window as a Rich Text Format (RTF) file.
[New] You can now save from the Source window areas as well as the Results window. The Results Table can be saved as XLS, CSV, Tab-Delimited or as a SQLite database.
[New] You can now Revise the Negative Verbs to more or less formal forms.
[New] You can now remove ASCII control characters.
[New] You can now jumble each word, creating Anagrams.
[New] You can now format a number with a specified text format e.g. Currency to “\$###,###,##0.00;(\$###,###,##0.00);\$0.00”
[New] You can now find out the difference between two dates and times, or between one date and time and the current date and time.
[New] You can now Find and Replace past a set number of characters
[New] You can now Find and Replace between a set of start and end characters
[New] You can now Find and Replace a set number of characters from the Start or End of the text
[New] You can now display the all 85,000+ USA surnames from the 2010 Census.
[New] You can now display all 5000+ USA and most common England and Wales male and female first names from the 2010 Census.
[New] You can now determine the Text Encoding method used by any text
[New] You can now determine the Frequency of Occurrences of a set text within each paragraph.
[New] You can now determine the distance between two USA or UK Postcodes. It will look up their latitude and longitude for you and do the calculation.
[New] You can now determine the distance between two latitude and longitude pairs. It uses the ‘Haversine Method’.
[New] You can now count the frequency of all individual letters.
[New] You can now convert to Title Case but exclude Conjunctions.
[New] You can now convert to and from US/UK period placement (.” or ”.)
[New] You can now convert Text to the Number of Words in the text
[New] You can now convert Text to the Number of Letters in the text
[New] You can now convert Text to Symbols e.g. Copyright symbol (c) to ©, etc
[New] You can now convert Text to Soundex format
[New] You can now convert text to Pig Latin or Kids Code (idsKay odeCay) and back.
[New] You can now convert text to phone keypad numbers.
[New] You can now convert text to Morse Code.
[New] You can now convert text to Leet Speak (L33T Speak) e.g. E=3, O=0, etc.
[New] You can now convert Text to its Numerical value
[New] You can now convert Text to its Hash Value number
[New] You can now convert Text to Double Metaphone format.
[New] You can now convert text to by shifting up or down one or more ASCII values.
[New] You can now convert text to Butcher’s Talk (letters in each word is reversed)
[New] You can now convert text to 96 different Text Encoding types UTF8, UTF16, etc
[New] You can now convert Text so it is compliant within MySQL or SQLite text searches
[New] You can now convert Tabs to Spaces
[New] You can now convert numbers into their word equivalents e.g. 1234.5 becomes one thousand, two hundred and thirty four point five.
[New] You can now convert non-breaking spaces to normal spaces
[New] You can now convert HTML Number to Text
[New] You can now convert dates to short, medium, long or universal formats
[New] You can now convert Bullets to hyphens.
[New] You can now convert a Date to the Number of Days left in that Year
[New] You can now convert a Date to the Day Number in that Year
[New] You can now convert a Date to the Day Name for that week e.g. Wednesday.
[New] You can now convert a Date to its Week Number in that year
[New] You can now convert a Bible reference to the actual verse.
[New] You can now clean up the text as Whole Text, Paragraph or by Letter.
[New] You can now add or subtract any number of years, months, days, hours, minutes or seconds to any date and time.
[New] This version has been compatibility tested to run on MacOS X 10.7 Lion, Windows XP/Vista/7 and Linux Ubuntu 11.
[New] There is now an option to Speak the selected text, or all text if nothing is selected. This works for the main Clean and Results windows.
[New] There is now a Help tip on the Source and Clipboard buttons on the Results window to explain what they do.
[New] There is now a Clear command to wipe the main Clean window text and the Results window.
[New] Opening a CSV file (.csv extension) now converts to tab-delimited text. If you want it to remain as it is in the file, change the extension to something else (like .txt). The text now loads into the Styled and table formats properly.
[New] Macintosh version 2.3 available in the Apple’s App Store
[New] CleanHaven now always displays a progress window when cleaning extremely long lists.
[Fix] When viewing some HTML text as a Web Page it now uses the default browser.
[Fix] When looking up the Salutation it now uses a database with over 5000 first names (it was 1000 previously).
[Fix] When determining the Frequency of items in a list it now displays both the Number of Different items and the Total Number of items.
[Fix] When cleaning up a phone number you can now choose the position of the space from the right hand side
[Fix] When changing Views the character counter below now adjusts to reflect the new view.
[Fix] The postcodes database has been updated with some recent UK postcode changes as well as latitude and longitude values.
[Fix] Converting massive text fields (> 1 million characters) is now much faster and no longer slows down towards the end.

CleanHaven 2.3 — 6 January 2011
[New] CleanHaven submitted to Apple App Store.

CleanHaven 2.2 — 6 January 2011
[Fix] Combining columns no longer places a comma on the left if the lefthand column is blank.
[Fix] Formatting as Phone Number now removes the round brackets and plus signs.
[Fix] Names beginning with salutations (such as Missy) are no longer confused as Salutations when converting to First/Last Names.
[Fix] Showing Duplicates no longer shows one extra non-duplicate.
[Fix] The splash screen has been removed so it is faster to launch.
[Fix] When determining the Full Name, double-barrelled Surnames are now hyphenated.
[New] A pro option in the Setting tab now allows for the clean operation to apply ONLY to selected text rather than the whole source text window.
[New] Now provides a Total under the Frequency.
[New] Pro option to auto-paste clipboard text into main window, clean the text, then auto-copy text from Results to clipboard on each clean
[New] There is now a popup to check your text amongst all dictionaries: English, English (Lite), English (British), French, German, Italian, Spanish and Icelandic. Please note that these dictionaries have been sourced by third parties and may be missing certain words.
[New] There is now a Pro keyboard hot-key (Command-L for Macintosh, Control-L for Windows and Linux) to perform the current clean on the clipboard text (via Keyboard System Preference)
[New] There is now a Pro option to convert all HTML Names to ASCII text.
[New] There is now a Pro option to convert all HTML Numbers to ASCII text.
[New] There is now a Pro option to convert all Text to URL Encoding.
[New] There is now a Pro option to convert all Text to HTML Numbers.
[New] There is now a Pro option to convert all Text to HTML Names.
[New] There is now a Pro option to convert all URL Encoding to Text.
[New] There is now a Pro option to Convert Styled text to HTML. You must be in the styled View for this to work.
[New] There is now a Pro option to find survey statistical information based on USA Postcodes.
[New] There is now a Pro option to find the City, County, State and Regional information based on USA & UK Postcodes.
[New] There is now a Pro option to find the Latitude and Longitude information based on USA & UK Postcodes.
[New] There is now a Pro option to remove all standard HTML scripts leaving the plain visible text.
[New] There is now a Quick Find button on the Find Tab to find the next occurrence of the Find text within the main window.
[New] There is now a way to Replace any quotes with any other European or world quotes and back. The country presents include: Afrikaans 1, Afrikaans 2, Albanian, Basque, Belarusian 1, Belarusian 2, Bulgarian 1, Bulgarian 2, Catalan 1, Catalan 2, Chinese (Simplified) 1, Chinese (Simplified) 2, Chinese (Traditional) 1, Chinese (Traditional) 2, Croatian 1, Croatian 2, Czech 1, Czech 2, Danish 1, Danish 2, Danish 3, Dutch 1, Dutch 2, English (UK), English (US), Esperanto, Estonian 1, Estonian 2, Finnish 1, Finnish 2, French 1, French 2, French 3, French (Swiss), Georgian 1, Georgian 2, German 1, German 2, German (Swiss), Greek 1, Greek 2, Hebrew 1, Hebrew 2, Hungarian, Icelandic, Indonesian, Irish, Italian 1, Italian 2, Italian (Swiss), Japanese, Korean 1, Korean 2, Latvian, Lithuanian 1, Lithuanian 2, Macedonian, Norwegian 1, Norwegian 2, Polish 1, Polish 2, Portuguese (Brazil), Portuguese (Portugal) 1, Portuguese (Portugal) 2, Romanian 1, Romanian 2, Russian, Serbian 1, Serbian 2, Serbian 3, Slovak 1, Slovak 2, Slovene 1, Slovene 2, Sorbian, Spanish 1, Spanish 2, Swedish 1, Swedish 2, Thai, Turkish 1, Turkish 2, Ukrainian 1, Ukrainian 2, Welsh 1 and Welsh 2.
[New] There is now an option to convert 'hyphen to em-dash or en-dash' and 'em/en-dash to hyphen'.
[New] There is now an option to Convert curly quotes to back to straight quotes.
[New] There is now an option to remove ‘All ASCII’ to show only characters higher than decimal 127.
[New] There is now the option to buy and register a serial number to upgrade to CleanHaven Pro.
[New] Title Case now has an option to not capitalise conjunctions i.e. Words such as: after, although, and, as, as if, as long as, because, before, but, even if, even though, for, if, nor, once, or, since, so, so that, the, though, till, unless, until, what, when, whenever, wherever, whether, while, yet, etc.
[New] You can now a Pro option to extract the contents of all quoted text, such as from within a page of HTML. This will return all quoted text, not just visible text, so it will return lots of text.
[New] You can now Preview your HTML Text in an HTML window.
[New] You can now show a complete HTML Text Conversion Table.
[New] You can now view your Text in Binary format (e.g. < (Decimal 60) as 00111100).
[New] You can now view your Text in Hexadecimal format (e.g. < (Decimal 60) as 3C).
[New] You can now view your Text in Octal format (e.g. < (Decimal 60) as 074).
[New] You now have an option to Invert the capitalisation of the text.

CleanHaven 2.1 — 19 August 2010
[New] You can now perform most functions by words as well as paragraphs. The Cleaned Results window will reflect the results word by word.
[New] You can now toggle between Table and Text views in the Settings tab (or typing Command-T). Choose whether all columns are affected by Convert and Replace actions or just a particular column.
[New] When formatting Phone numbers CleanHaven asks for the position of the space divider.
[New] The text fields now remove all formatting when text is pasted into them.
[New] The Info menu can now show the Frequency of Paragraphs or Words from most to least common.
[New] The CleanHaven window displays a counter of the text variables. The number of variables is reduced when viewing over 100,000 characters (to reduce the slowdown).
[New] The Cleaned Results Text window can now be saved as a plain text file.
[New] The Cleaned Results Table window can now be saved as an Excel XML, tab-delimited or CSV file.
[New] Optionally the English dictionary can be replaced with any one of the French, German, Italian, Spanish, Icelandic and British English dictionaries (download the AllDictionaries.zip file, rename the chosen dictionary with filename ‘words’ and copy over the previous ‘words’ file in the Resources folder of the application. Mac users will have to right-click the application and choose ‘Show Package Contents’ first).
[New] Now uses the English ‘YAWL 3’ dictionary with 264,058 words (rather than the previous ‘SCOWL 6 lite’ dictionary with only 77,676 words).
[New] Menu Commands can now bring the CleanHaven or Cleaned Results to the foreground.
[New] Auto-suggests conversion of Linefeeds to Returns in text (NeoOffice and OpenOffice use linefeeds as delimiters).
[New] An Open dialog now opens any text file. If the file contains a ‘.csv’ extension the data is parsed as a CSV file.
[New] A Clipboard button on the Cleaned Results window copies the whole Results text to the Clipboard. If you need only a portion, then highlight that portion then use the normal Copy command.
[Fix] Title Case now makes capitals after ‘-’, ‘.’, ‘(‘ and ‘/’. Words such ‘of’ are made lowercase. Paragraphs ending with UK, ‘ UK ’ and ‘(UK)’ have this extension made all uppercase.
[Fix] The Value conversion has been moved from the Personal menu to the Info menu.
[Fix] The UK postcodes have been updated to be more accurate.
[Fix] The Text Information calculation of the number of words and paragraphs has been improved.
[Fix] The text field font is now much smaller.
[Fix] The Salutation have been updated to be more accurate and some non-gender-specific first names have been removed.
[Fix] The Phone Number formatting now takes into account any extension.
[Fix] The Cleaned Results text is now read-only.
[Fix] On Windows, after a Clean the text window now is scrolled to the top of the list.
[Fix] Fixed a problem where removing excess spaces was removing spaces before plain quotes.

[CleanHaven 2.0 — 6 May 2010
New] Checks for new version on HMS server.
[New] Windows now uses an installer
[Fix] Advertising window removed.
Case:
[New] You can now convert any base text into title case (first letter of each word is uppercase, the rest is lowercase), sentence case (first letter of each sentence is uppercase, the rest is lowercase), uppercase, lowercase, random case (each time you choose it each letter is randomly uppercase or lowercase) and curly quotes (plain quotes are changed to the 66/99 alternatives).
Sort:
[New] You can now sort any base text into ascending order (A to Z), case-sensitive ascending order (A-Z then a-z), descending order (Z-A), random order (sorts each paragraph differently each time you click Clean), reverse order (the opposite of however it is now sorted) and numerical order (sorts by the full numerical value rather than order of the text e.g. 0-9, 10-19 whereas alphabetically ’10’ comes after ‘1’ and before ‘2’).
Duplicates:
[New] You can now remove all paragraphs that have one or more duplicate paragraphs, show only those paragraphs that have one or more duplicates and show only those paragraphs that are unique.
Remove:
[New] You can now remove all excess returns (where one of more carriage return characters appears at the start or end, or double-spaces appear in the middle of the base text).
[New] You can now remove all excess spaces (where one of more space characters appears at the start or end, or double-spaces appear in the middle of the base text).
[New] You can now remove all line feeds (where one of more line feed characters appears anywhere in the base text).
[New] You can now convert all line feed characters to carriage returns e.g. some programs such as OpenOffice use line feeds as delimiters rather than carriage returns.
[New] You can now remove all non-ACSII characters i.e. that are not within the first 128 ASCII range.
[New] You can now remove all non-letter characters i.e. that are not within the a-z and A-Z range.
[New] You can now remove all non-number or space characters i.e. leaving you with only numerical data.
[New] You can now remove all periods.
[New] You can now remove all punctuation characters.
[New] You can now remove all carriage returns.
[New] You can now convert all carriage returns to line feed characters e.g. some programs such as OpenOffice use line feeds as delimiters rather than carriage returns.
[New] You can now remove all tab characters e.g. though invisible these can cause a problem when pasting text into spreadsheets.
[New] You can now see the numerical value of the each paragraph e.g. Get the values of a spreadsheet column containing currency, commas or text.
Personal:
[New] You can now combine columns separated by tabs e.g. converts multiple-columns of spreadsheet address data into one column of data ‘1 Main Street’ & ‘New York’ becomes ‘1 Main Street, New York’ replacing the tab with a comma where necessary. Empty columns are ignored.
[New] You can now combine paragraphs separated by superfluous carriage returns e.g. some email programs wrap text after 70 characters wide making this text very hard to edit in a text editor.
[New] You can now convert a column of email address to their web equivalent e.g. ‘info@apple.com’ becomes ‘info@apple.com www.apple.com’ so the data can be pasted back into a spreadsheet.
[New] You can now convert a single column of first/lastname pairs to their separate columns, converts them to title case and, if missing, tries to find the correct salutation based on over 1000 common first names e.g.‘john smith’ becomes‘Mr John Smith’ so the data can be pasted back into a spreadsheet.
[New] You can now convert a single column of phone numbers into a common ##### ###### format. It removes all excess spaces then places a space six characters from the right (unless there is a phone extension). This may be useful when searching for phone numbers in a CRM such as CRMHaven.
[New] You can now look up the salutation for a single column of first names based on a database of over 1000 common first names. Males names are given the salutation‘Mr’ and female names are given the salutation‘Ms’.
[New] You can now look up the UK postcode for a single column of UK postcodes from a database of over 2800 UK postcodes. From the set of UK postcodes the five values are returned as‘Postcode City County Region Country’ suitable for pasting back into a spreadsheet. Note that it only searches up until the first space character so the City/Town name is only approximate. Contact HMS if you want your country’s postcodes included.
Info:
[New] You can now check the spelling on a list of English words based on a database of a few thousand words. It will return either a list of only the correctly spelled words, or only the incorrectly spelled words or a list of the original words with a tab separating whether the original word is correct or incorrect so it can be pasted into a spreadsheet. Note that the dictionary is not exhaustive and there may be correctly spelled words that are marked as incorrect.
[New] You can now decipher the ASCII values of a set of text characters e.g. ‘The Cat Sat’ becomes T«84»h«104»e«101» «32»C«67»a«97»t«116» «32»S«83»a«97»t«116»«13»
[New] You can now get information about the text including the number of words, double-spaces, characters, non-space-characters, paragraphs, linefeed, escape and tab characters.
Stacks Image 885
Anagram Releases

Available Now Anagram 1.2 — 6 March 2014
[Fix] Bug fixes
[New] Added new Wikipedia entries for alpha-grams, palindromes, Phone Keypad Numbers, Binary, Hexadecimal and Octal.
[New] Added the ability to copy the Converted text to the Source text window.
[New] Added the ability to jumble paragraphs (as well as words) into Anagrams
[New] Added the ability to sort paragraphs into Palindromes (letters reverses). We could already do this by word with Butcher’s Talk.
[New] Added the ability to sort words and paragraphs into Alpha-grams (letters sorted alphabetically)
[New] Added the ability to translate words into Phone Keypad Numbers
[New] Added the ability to view letters as their ASCII number, Binary, Description, Hexadecimal or Octal formats.
[New] Font size can now be up to 48 points.
[New] Improved the layout of the Wikipedia windows.
[New] Reconfigured screen layout to be horizontal
[New] Windows can now display full screen.

Anagram 1.0 — 18 February 2012
[New] Released on Apple App Store and locked URL
Stacks Image 892
Change Case Releases

Available Now Change Case 1.2 — 7 March 2014
[Fix] Bug fixes
[Fix] Sentence case and Title case now capitalise the beginning of new sentences properly.
[New] Added new Wikipedia entries for all conversion types.
[New] Added the ability to copy the Converted text to the Source text window.
[New] Font size can now be up to 48 points.
[New] Improved the layout of the Wikipedia windows.
[New] Reconfigured screen layout to be horizontal
[New] Windows can now display full screen.
[New] You can now convert diacritical text to its non-diacritical counterparts e.g. ç becomes c (lowercase) or É becomes E (uppercase).

Change Case 1.0 — 3 February 2012
[New] Released on Apple App Store and locked URL
Stacks Image 899
DeDuplicate Releases

Available Now Data Cleansing 1.1 — 21 September 2012
[New] Compatible with MacOS X Mountain Lion.
[New] Added the ability to clean one field based on the value in another field
[New] Added the ability to add a MySQL WHERE command. This allows you to limit what records are considered for cleansing.
[New] Added the ability to determine the Salutation based on the First Name.
[New] Added the ability to look up data based on a UK or USA postcode. Updates include City, State/County, Country, Region, SubRegion, Latitude and Longitude.
[New] Program is freely downloadable from the web site and came be run twice before needing a purchase.

Data Cleansing 1.0.1 — 24 July 2012
[Fix] Fixed a bug where Preview would crash if the MySQL data being received had accented characters. Also the accented values would not Update correctly. It now asks the MySQL server to provide the data in UTF-8 format.
[New] Re-submitted to Apple App Store and locked URL
[New] v1.0 Available in Apple App Store

Data Cleansing 1.0 — 31 May 2012
[New] Submitted to Apple App Store and locked URL
Stacks Image 906
DeDuplicate Releases

Available Now DeDuplicate 1.2 — 12 March 2014
[Fix] Bug fixes
[New] Added new Wikipedia entries for all conversion types.
[New] Added the ability to copy the Converted text to the Source text window.
[New] Added the ability to get a frequency count from each duplicate paragraph.
[New] Font size can now be up to 48 points.
[New] Improved the layout of the Wikipedia windows.
[New] Reconfigured screen layout to be horizontal
[New] Windows can now display full screen.
[New] Added the ability to de-duplicate as case-sensitive — ‘ABC’ and ‘AbC’ are not considered duplicates.
[New] Added the ability to remove blank lines from the duplication.

DeDuplicate 1.0 — 8 February 2012
[New] Released on Apple App Store and locked URL
Stacks Image 913
Find And Replace Releases

Available Now Find And Replace 1.2 — 21 March 2014
[Fix] Bug fixes
[Fix] In Find and Replace ‘Characters from Start’ now works correctly, replacing the right number of characters, not one too many.
[New] Added new RegEx searching engine (PCRE 8.33)
[New] Added new Wikipedia entry for Regular Expressions (RegEx).
[New] Added the ability to copy the Converted text to the Source text window.
[New] Added the ability to view the text differences between the two panels graphically. You can scan through each occurrence with each difference highlighted below.
[New] Font size can now be up to 48 points.
[New] Improved the layout of the Wikipedia windows.
[New] Reconfigured screen layout to be horizontal
[New] When scanning through a list of found items, you can now click on a Previous button to search back up the text from the current selection.
[New] Windows can now display full screen.
[New] You now have a button to speak the converted text.

Find And Replace 1.0 — 15 February 2012
[New] Released on Apple App Store and locked URL
Stacks Image 920
Its About Time Releases

Available Now Its About Time 1.0 — 11 February 2012
[New] Released on Apple App Store and locked URL
Stacks Image 873
Sort It Out Releases

Available Now Sort It Out 1.0 — 5 February 2012
[New] Released on Apple App Store and locked URL