Chris Fay & Jennie Fay’s Webportal

Words on a platter just to get you wet…

Fix odd characters in Wordpress posts

I noticed recently that my Wordpress data had some funky characters in almost every post:

Examples:

’
…

After browsing through the posts it looks like those characters represent things like quotations, dashes and double periods. I read through various forums that said this is caused due to the encoding and collation, however after following the recommendations of sites like http://www.mydigitallife.info/2007/0…to-version-22/ I was still unable to get rid of the huge amount of garbled characters. Ultimately, I spent some time mapping each set of characters to the appropriate one (quotes, commas, etc…) and built a php script to basically:

1. Loop through every post in the Wordpress database
2. Find and replace all the garbled characters
3. Re-insert the data

I have attached the script below (very simple) in case others find this the only way to cleanse out this character issue. One other thing I should mention - this only works if the garbled characters are actually garbled in the database - if the issue is simply how your browser is rendering the text, and not how the strings are actually stored in the database, then this wont help you. Although feel free to give it a shot.

One other thing - you may need to add additional characters to replace in the script. It should be self explanatory how based on what I have presented already. If you need help. simply post in here.

<?php
/*
 * cleanseWordpressPosts.php
 * Script used to replace various characters displayed in posts with appropriate values
 * Written by: Chris Fay
 * 6/14/2008
 *
 * To Use: First backup your database!
 * Run the script in test mode to see what data would be cleansed
 * Change testmode to false to allow script to cleanse database 
 *
 * If you need other characters clensed then simply duplicte one of the lines that look like:
 *
 * $cleansed = str_replace("“", "", $cleansed, $count4); 
 *
 * .. and if using the example above replace the “ with the string you want to replace
 * and inside the double quotes add the value you want to replace with. 
 */
 
$dbhost = 'localhost';
$dbuser = 'dbUserNameHere'; //replace dbUserNameHere with your username
$dbpass = 'dbPasswordHere'; //replace dbPasswordHere with your user's password
$dbname = 'dbNam'; //replace dbName with the database name you want to connect to
define('TESTMODE', true); //leave true to get information on what would be updated - the database
						  //will not be touched if set to true - change to false to allow the script to modify the datbase
 
$conn = mysql_connect($dbhost, $dbuser, $dbpass) or die                      ('Error connecting to mysql');
mysql_select_db($dbname);
 
//loop through each record, cleanse it, then re-insert it
$query  = "SELECT id, post_content FROM wp_posts";
$result = mysql_query($query);
 
if(TESTMODE) {
	echo "**** TEST MODE: No data will be modified in your database ****<br /><br />";
}
 
while($row = mysql_fetch_array($result, MYSQL_ASSOC))
{	
	$id = $row['id'];
	$unclensed = $row['post_content'];
 
	echo "Cleansing id: $id<br />";	
 
    $cleansed = str_replace("’", "'", $unclensed, $count1); 
	$cleansed = str_replace("ÂÃ", "", $cleansed, $count2); 
	$cleansed = str_replace("â€", "", $cleansed, $count3); 		
	$cleansed = str_replace("“", "", $cleansed, $count4);
	$cleansed = str_replace("…", ".", $cleansed, $count5);
	$cleansed = str_replace("’", "'", $cleansed, $count6);
	$cleansed = str_replace("â„¢", "'", $cleansed, $count7);
	$cleansed = str_replace("Å“", "", $cleansed, $count8);
	$cleansed = str_replace("˜", "'", $cleansed, $count9);
	$cleansed = str_replace("¦¦", "", $cleansed, $count10);
	$cleansed = str_replace("¦", "", $cleansed, $count11);
	$cleansed = str_replace("â€Â", " ", $cleansed, $count12);
 
	$escaped = mysql_escape_string($cleansed);
 
	if(TESTMODE) {
		echo "ID: $id would have been cleansed. <br>";
		echo "Cleanse count for each string:<br />"
			."’: $count1<br />"
			."ÂÃ: $count2<br />"
			."â€: $count3<br />"
			."“: $count4<br />"
			."…: $count5<br />"
			."’: $count6<br />"
			."â„¢: $count7<br />"
			."Å“: $count8<br />"
			."˜: $count9<br />"
			."¦¦: $count10<br />"
			."¦: $count11<br />"					  
			."â€Â: $count12<br />";
 
		$query2 = "update wp_posts set post_content = '".$escaped."' where id = '".$id."'";	  	
		echo "The insert query we'd be using:<br /> $query2<br /><br />";
	} else {
		echo "ID: $id clensed. <br>"
			."Inserting data into table<br />";
 
		$query2 = "update wp_posts set post_content = '".$escaped."' where id = '".$id."'";	  	
		mysql_query($query2) or die('Error, insert query failed');
		echo "Post cleansed and updated<br /><br />";	    
	}
} //end while
 
echo "All finished...<br />";
 
mysql_close($conn);
?>


Tagged as , , + Categorized as Programming/PHP, Programming

Leave a Reply