<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Standardization vs. normalization</title>
	<atom:link href="http://www.dataminingblog.com/standardization-vs-normalization/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.dataminingblog.com/standardization-vs-normalization/</link>
	<description>Data mining crossroads - research, applications, news, list of blogs and customized search engine about data mining.</description>
	<lastBuildDate>Fri, 30 Jul 2010 20:48:12 +0200</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Sandro Saitta</title>
		<link>http://www.dataminingblog.com/standardization-vs-normalization/comment-page-1/#comment-599</link>
		<dc:creator>Sandro Saitta</dc:creator>
		<pubDate>Thu, 06 Dec 2007 09:26:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dataminingblog.com/standardization-vs-normalization#comment-599</guid>
		<description>First, you can normalize/standardize your data. Or, on the contrary, you can maybe decide to manually fix weights to each of these metrics.&lt;br/&gt;&lt;br/&gt;You can for example use an objective function. Let say you want to maximize a function of the Expenses, Profits and Turnover. In the objective function, give a negative weight to Expenses and Turnover and a positive one to Profits. I don&#039;t know if this will work for your problem, but that would be my first guess.</description>
		<content:encoded><![CDATA[<p>First, you can normalize/standardize your data. Or, on the contrary, you can maybe decide to manually fix weights to each of these metrics.</p>
<p>You can for example use an objective function. Let say you want to maximize a function of the Expenses, Profits and Turnover. In the objective function, give a negative weight to Expenses and Turnover and a positive one to Profits. I don&#8217;t know if this will work for your problem, but that would be my first guess.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mb</title>
		<link>http://www.dataminingblog.com/standardization-vs-normalization/comment-page-1/#comment-598</link>
		<dc:creator>mb</dc:creator>
		<pubDate>Fri, 30 Nov 2007 00:23:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dataminingblog.com/standardization-vs-normalization#comment-598</guid>
		<description>Will, can you tell me how I can scale linearly so that the 5th and 95th percentiles meet some standard range?&lt;br/&gt;Can this be done with both negative and positive values?&lt;br/&gt;&lt;br/&gt;Another question:&lt;br/&gt;If I want to compute an index where not only the units and scales are different, but also the input metrics into the index have different interpretations - specifically, one metric is better if the values are higher and another one is better if the values are lower, how can I compute an index that represents all numbers concisely and meaningfully?&lt;br/&gt;Let&#039;s say I have Expenses ($), Profits($) and Turnover (%). Expenses and Turnover are better if lower, but Profits are better if higher.&lt;br/&gt;If comparing two companies on these metrics, and I want to compute one index to show the &quot;best&quot; performing company on these parameters, how can I do this?&lt;br/&gt;Sorry, not strictly data-mining relevant, but thought someone here might have an answer!&lt;br/&gt;&lt;br/&gt;Tried using z-scores and normalizing but doesnt work due to different hi-low interpretations.&lt;br/&gt;Eventually used a reverse-rank for Expenses and Turnover so that all have same order. However, rank does not show quantity difference between the two companies, just their ranks!&lt;br/&gt;&lt;br/&gt;this is a great blog, thanks to all for helpful comments.</description>
		<content:encoded><![CDATA[<p>Will, can you tell me how I can scale linearly so that the 5th and 95th percentiles meet some standard range?<br />Can this be done with both negative and positive values?</p>
<p>Another question:<br />If I want to compute an index where not only the units and scales are different, but also the input metrics into the index have different interpretations &#8211; specifically, one metric is better if the values are higher and another one is better if the values are lower, how can I compute an index that represents all numbers concisely and meaningfully?<br />Let&#8217;s say I have Expenses ($), Profits($) and Turnover (%). Expenses and Turnover are better if lower, but Profits are better if higher.<br />If comparing two companies on these metrics, and I want to compute one index to show the &#8220;best&#8221; performing company on these parameters, how can I do this?<br />Sorry, not strictly data-mining relevant, but thought someone here might have an answer!</p>
<p>Tried using z-scores and normalizing but doesnt work due to different hi-low interpretations.<br />Eventually used a reverse-rank for Expenses and Turnover so that all have same order. However, rank does not show quantity difference between the two companies, just their ranks!</p>
<p>this is a great blog, thanks to all for helpful comments.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Will Dwinnell</title>
		<link>http://www.dataminingblog.com/standardization-vs-normalization/comment-page-1/#comment-551</link>
		<dc:creator>Will Dwinnell</dc:creator>
		<pubDate>Fri, 13 Jul 2007 17:44:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dataminingblog.com/standardization-vs-normalization#comment-551</guid>
		<description>For readers who are not aware of this technique: &quot;Winsorizing&quot; data simlpy means clamping the extreme values.&lt;br/&gt;&lt;br/&gt;This is similar to trimming the data, except that instead of discarding data: values greater than the specified upper limit are replaced with the upper limit, and those below the lower limit are replace with the lower limit.  Often, the specified range is indicate in terms of percentiles of the original distribution (like the 5th and 95th percentile).&lt;br/&gt;&lt;br/&gt;This process is sometimes used to make conventional measures more robust, as in the Winsorized variance.</description>
		<content:encoded><![CDATA[<p>For readers who are not aware of this technique: &#8220;Winsorizing&#8221; data simlpy means clamping the extreme values.</p>
<p>This is similar to trimming the data, except that instead of discarding data: values greater than the specified upper limit are replaced with the upper limit, and those below the lower limit are replace with the lower limit.  Often, the specified range is indicate in terms of percentiles of the original distribution (like the 5th and 95th percentile).</p>
<p>This process is sometimes used to make conventional measures more robust, as in the Winsorized variance.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sandro Saitta</title>
		<link>http://www.dataminingblog.com/standardization-vs-normalization/comment-page-1/#comment-550</link>
		<dc:creator>Sandro Saitta</dc:creator>
		<pubDate>Fri, 13 Jul 2007 14:46:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dataminingblog.com/standardization-vs-normalization#comment-550</guid>
		<description>Thanks for your comment fay. I agree with you on taking the log. I use to work with data in the range 10^6 to 10^12 for example. And thanks for the remark :-)&lt;br/&gt;&lt;br/&gt;Will, your suggestions seem very interesting. I don&#039;t know the &quot;winsorize&quot; technique, but it seems it could be used in addition to normalization.</description>
		<content:encoded><![CDATA[<p>Thanks for your comment fay. I agree with you on taking the log. I use to work with data in the range 10^6 to 10^12 for example. And thanks for the remark <img src='http://www.dataminingblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Will, your suggestions seem very interesting. I don&#8217;t know the &#8220;winsorize&#8221; technique, but it seems it could be used in addition to normalization.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Will Dwinnell</title>
		<link>http://www.dataminingblog.com/standardization-vs-normalization/comment-page-1/#comment-549</link>
		<dc:creator>Will Dwinnell</dc:creator>
		<pubDate>Thu, 12 Jul 2007 22:00:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dataminingblog.com/standardization-vs-normalization#comment-549</guid>
		<description>A few points come to mind:&lt;br/&gt;&lt;br/&gt;1. Monotonic scaling of the data (assuming that distinct values are not collapsed) will have no affect on the most common logical learning algorithms (tree- and rule-induction algorithms).&lt;br/&gt;&lt;br/&gt;2. There are robust alternatives, such as: subtract the median and divide by the IQR, or scale linearly so that the 5th and 95th percentiles meet some standard range.&lt;br/&gt;&lt;br/&gt;3. Outliers (technically, and high leverage points) present an interesting challenge.  One possibility is to Winsorize the data after scaling it.</description>
		<content:encoded><![CDATA[<p>A few points come to mind:</p>
<p>1. Monotonic scaling of the data (assuming that distinct values are not collapsed) will have no affect on the most common logical learning algorithms (tree- and rule-induction algorithms).</p>
<p>2. There are robust alternatives, such as: subtract the median and divide by the IQR, or scale linearly so that the 5th and 95th percentiles meet some standard range.</p>
<p>3. Outliers (technically, and high leverage points) present an interesting challenge.  One possibility is to Winsorize the data after scaling it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Fay</title>
		<link>http://www.dataminingblog.com/standardization-vs-normalization/comment-page-1/#comment-548</link>
		<dc:creator>Fay</dc:creator>
		<pubDate>Thu, 12 Jul 2007 13:47:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.dataminingblog.com/standardization-vs-normalization#comment-548</guid>
		<description>Sometimes perhaps we can take logarithms of input data when they contain order-of-magnitude larger and smaller values.  However, since logarithms are defined for positive values only,  we need to take care when the input data may contain zero and negative values. &lt;br/&gt;You did a very good work on your blog! :)</description>
		<content:encoded><![CDATA[<p>Sometimes perhaps we can take logarithms of input data when they contain order-of-magnitude larger and smaller values.  However, since logarithms are defined for positive values only,  we need to take care when the input data may contain zero and negative values. <br />You did a very good work on your blog! <img src='http://www.dataminingblog.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
</channel>
</rss>
