博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
The Model Complexity Myth
阅读量:6967 次
发布时间:2019-06-27

本文共 2795 字,大约阅读时间需要 9 分钟。

(or, Yes You Can Fit Models With More Parameters Than Data Points)

An oft-repeated rule of thumb in any sort of statistical model fitting is "you can't fit a model with more parameters than data points". This idea appears to be as wide-spread as it is incorrect. On the contrary, if you construct your models carefully, you can fit models with more parameters than datapoints, and this is much more than mere trivia with which you can impress the nerdiest of your friends: as I will show here, this fact can prove to be very useful in real-world scientific applications.

A model with more parameters than datapoints is known as an under-determined system, and it's a common misperception that such a model cannot be solved in any circumstance. In this post I will consider this misconception, which I call the model complexity myth. I'll start by showing where this model complexity myth holds true, first from from an intuitive point of view, and then from a more mathematically-heavy point of view. I'll build from this mathematical treatment and discuss how underdetermined models may be addressed from a frequentist standpoint, and then from a Bayesian standpoint. (If you're unclear about the general differences between frequentist and Bayesian approaches, I might suggest reading  on the subject). Finally, I'll discuss some practical examples of where such an underdetermined model can be useful, and demonstrate one of these examples: quantitatively accounting for measurement biases in scientific data.

The Root of the Model Complexity Myth

While the model complexity myth is not true in general, it is true in the specific case of simple linear models, which perhaps explains why the myth is so pervasive. In this section I first want to motivate the reason for the underdetermination issue in simple linear models, first from an intuitive view, and then from a more mathematical view.

I'll start by defining some functions to create plots for the examples below; you can skip reading this code block for now:

In [1]:
# Code to create figures%matplotlib inlineimport matplotlib.pyplot as plt import numpy as np plt.style.use('ggplot') def plot_simple_line(): rng = np.random.RandomState(42) x = 10 * rng.rand(20) y = 2 * x + 5 + rng.randn(20) p = np.polyfit(x, y, 1) xfit = np.linspace(0, 10) yfit = np.polyval(p, xfit) plt.plot(x, y, 'ok') plt.plot(xfit, yfit, color='gray') plt.text(9.8, 1, "y = {0:.2f}x + {1:.2f}".format(*p), ha='right', size=14); def plot_underdetermined_fits(p, brange=(-0.5, 1.5), xlim=(-3, 3), plot_conditioned=False): rng = np.random.RandomState(42) x, y = rng.rand(2, p).round(2) xfit = np.linspace(xlim[0], xlim[1]) for r in rng.rand(20): # add a datapoint to make model specified b = brange[0] + r * (brange[1] - brange[0]) xx = np.concatenate([x, [0]]) yy

转载地址:http://ynisl.baihongyu.com/

你可能感兴趣的文章
Java常用的集合类
查看>>
用百度地图API分析打交通大数据
查看>>
quartz表达式在线生成器
查看>>
selenium 中装饰器作用
查看>>
mysql驱动名更新
查看>>
三、Flask_会话控制与请求钩子
查看>>
WS Security 认证方式详解
查看>>
Spring Webflux: Kotlin DSL [片断]
查看>>
搜索引擎选择: Elasticsearch与Solr
查看>>
mysql联合索引
查看>>
监听服务管理(转)
查看>>
java中Hashtable中的t为什么是小写(转)
查看>>
linux C 内存管理方式之半动态
查看>>
图文并茂的生产者消费者应用实例demo
查看>>
asp.net core上使用redis探索(1)
查看>>
程序员的职业素养(读书笔记)-- 第一章
查看>>
Java实现线性表-顺序表示和链式表示
查看>>
HDU Simple Addition Expression
查看>>
mysql启动和关闭外键约束的方法
查看>>
idea如何打war包?(部署tomcat后具有class文件)
查看>>