> "The theory of probabilities is at bottom nothing but common sense reduced to calculus; **it enables us to appreciate with exactness that which accurate minds feel with a sort of instinct for which times they are unable to account**."
# Summary
大数定理 和 中心极限定理 如何一起撑起统计推断?
| | 步骤 | [大数定律 Law of Large Numbers, LLN](大数定律%20Law%20of%20Large%20Numbers,%20LLN.md)作用 | [中心极限定理 CLT](中心极限定理%20CLT.md)作用 |
| --------------- | ------------ | ------------------------------------------------------------------------------- | ------------------------------------------ |
| [参数估计](参数估计.md) | [[点估计]] | 确保估计 **收敛到真值**(一致性)|—|
| [参数估计](参数估计.md) | [[区间估计]] |—| 提供**误差分布**,把“点”扩成“区间” |
| [[假设检验]] | |—| 预先知道检验统计量近似正态,才能给出 p‑value |
| | **渐近最优性/效率** | 依赖 LLN 的一致性作为前提 | 用 CLT 的方差下界 (Cramér–Rao, Fisher 信息) 衡量“最优” |
| | | | |
[《基础统计学》](《基础统计学》)
哪些统计学的书让你相见恨晚?- Psychonomist 的回答 - 知乎
https://www.zhihu.com/question/602368094/answer/3046935637
概率论与数理统计其实是两个不同的领域,但用的是同一套 [方法论](https://www.zhihu.com/search?q=%E6%96%B9%E6%B3%95%E8%AE%BA&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra=%7B%22sourceType%22%3A%22answer%22%2C%22sourceId%22%3A%223088786460%22%7D)。用一个简单的例子来阐明概率论与数理统计的区别:
- 已知盒子中的红球数量和白球数量,问抓到红球的概率,这叫做概率论;
- 未知盒子中红球数量和白球数量,但随机抓出了一些红球和白球且数量已知,反推盒子中的红球和白球数量,这叫做数理统计。
所以,数理统计是概率论的逆问题。数学中有很多互逆的问题,比如加法和减法互逆,乘法和 [除法互逆](https://www.zhihu.com/search?q=%E9%99%A4%E6%B3%95%E4%BA%92%E9%80%86&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra=%7B%22sourceType%22%3A%22answer%22%2C%22sourceId%22%3A%223088786460%22%7D),求导和积分互逆,这里我们又谈到了概率论与 [数理统计互逆](https://www.zhihu.com/search?q=%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1%E4%BA%92%E9%80%86&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra=%7B%22sourceType%22%3A%22answer%22%2C%22sourceId%22%3A%223088786460%22%7D)。所以,我们正确的学习步骤是,先把作为正问题的概率论搞懂,再去搞懂作为逆问题的数理统计。就期末复习来说,你需要首先了解这门学科的考点。以考点为纲进行复习,效率马上翻几倍。
- [两种画图方式](#%E4%B8%A4%E7%A7%8D%E7%94%BB%E5%9B%BE%E6%96%B9%E5%BC%8F)
- [1. 散点图](#1.%20%E6%95%A3%E7%82%B9%E5%9B%BE)
- [2. 直方图](#2.%20%E7%9B%B4%E6%96%B9%E5%9B%BE)
- [对数据分布的形容](#%E5%AF%B9%E6%95%B0%E6%8D%AE%E5%88%86%E5%B8%83%E7%9A%84%E5%BD%A2%E5%AE%B9)
- [条件概率到贝叶斯公式](#%E6%9D%A1%E4%BB%B6%E6%A6%82%E7%8E%87%E5%88%B0%E8%B4%9D%E5%8F%B6%E6%96%AF%E5%85%AC%E5%BC%8F)
- [身高统计 - 正态分布](#%E8%BA%AB%E9%AB%98%E7%BB%9F%E8%AE%A1%20-%20%E6%AD%A3%E6%80%81%E5%88%86%E5%B8%83)
- [投篮场景](#%E6%8A%95%E7%AF%AE%E5%9C%BA%E6%99%AF)
- [投篮的散点图](#%E6%8A%95%E7%AF%AE%E7%9A%84%E6%95%A3%E7%82%B9%E5%9B%BE)
- [投中的比投丢的 - 二项分布](#%E6%8A%95%E4%B8%AD%E7%9A%84%E6%AF%94%E6%8A%95%E4%B8%A2%E7%9A%84%20-%20%E4%BA%8C%E9%A1%B9%E5%88%86%E5%B8%83)
- [泊松分布](#%E6%B3%8A%E6%9D%BE%E5%88%86%E5%B8%83)
- [HashMap 的链表转红黑树阈值](#HashMap%20%E7%9A%84%E9%93%BE%E8%A1%A8%E8%BD%AC%E7%BA%A2%E9%BB%91%E6%A0%91%E9%98%88%E5%80%BC)
# 概率
## 两种画图方式
### 1. 散点图

### 2. 直方图
直方图其实就是散点图旋转 90 度后,统计个数/绿色的粗细得到的

## 对数据分布的形容
均值、众数、分位数、中位数都是对数据分布情况的数字化描述
方差(图中正方形)和标准差(图中正方形的边长)是对数据整体与均值水平线的偏移程度的刻画

## 条件概率到贝叶斯公式

## 身高场景 - 正态分布
## 投篮场景

### 投篮的散点图


横轴是样本序号 index,纵轴是观测到结果的话,散点图上应该只有两条横线,因为只有两种观测值,要么投篮投中,要么投篮不中,
### 投中的比投丢的 - 二项分布
本质上,我们好奇的是两根横线的长短之比,也就是中了几颗,不中几颗?所有的投中投丢比就构成了二项分布。二项分布的概率公式就可以帮助计算出某个场景即某种 hit 与 miss 之比出现的概率。
#### 泊松分布
泊松分布的**本质还是二项分布**,泊松分布只是用来简化二项分布计算的。就是二项分布公式的函数图像在 index 数量无穷大处的一个等价的函数表达式
#### HashMap 的链表转红黑树阈值
```java
/*
* Implementation notes.
*
* This map usually acts as a binned (bucketed) hash table, but
* when bins get too large, they are transformed into bins of
* TreeNodes, each structured similarly to those in
* java.util.TreeMap. Most methods try to use normal bins, but
* relay to TreeNode methods when applicable (simply by checking
* instanceof a node). Bins of TreeNodes may be traversed and
* used like any others, but additionally support faster lookup
* when overpopulated. However, since the vast majority of bins in
* normal use are not overpopulated, checking for existence of
* tree bins may be delayed in the course of table methods.
*
* Tree bins (i.e., bins whose elements are all TreeNodes) are
* ordered primarily by hashCode, but in the case of ties, if two
* elements are of the same "class C implements Comparable<C>",
* type then their compareTo method is used for ordering. (We
* conservatively check generic types via reflection to validate
* this -- see method comparableClassFor). The added complexity
* of tree bins is worthwhile in providing worst-case O(log n)
* operations when keys either have distinct hashes or are
* orderable, Thus, performance degrades gracefully under
* accidental or malicious usages in which hashCode() methods
* return values that are poorly distributed, as well as those in
* which many keys share a hashCode, so long as they are also
* Comparable. (If neither of these apply, we may waste about a
* factor of two in time and space compared to taking no
* precautions. But the only known cases stem from poor user
* programming practices that are already so slow that this makes
* little difference.)
*
* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. In
* usages with well-distributed user hashCodes, tree bins are
* rarely used. Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
*
* 0: 0.60653066
* 1: 0.30326533
* 2: 0.07581633
* 3: 0.01263606
* 4: 0.00157952
* 5: 0.00015795
* 6: 0.00001316
* 7: 0.00000094
* 8: 0.00000006
* more: less than 1 in ten million
*
* The root of a tree bin is normally its first node. However,
* sometimes (currently only upon Iterator.remove), the root might
* be elsewhere, but can be recovered following parent links
* (method TreeNode.root()).
*
* All applicable internal methods accept a hash code as an
* argument (as normally supplied from a public method), allowing
* them to call each other without recomputing user hashCodes.
* Most internal methods also accept a "tab" argument, that is
* normally the current table, but may be a new or old one when
* resizing or converting.
*
* When bin lists are treeified, split, or untreeified, we keep
* them in the same relative access/traversal order (i.e., field
* Node.next) to better preserve locality, and to slightly
* simplify handling of splits and traversals that invoke
* iterator.remove. When using comparators on insertion, to keep a
* total ordering (or as close as is required here) across
* rebalancings, we compare classes and identityHashCodes as
* tie-breakers.
*
* The use and transitions among plain vs tree modes is
* complicated by the existence of subclass LinkedHashMap. See
* below for hook methods defined to be invoked upon insertion,
* removal and access that allow LinkedHashMap internals to
* otherwise remain independent of these mechanics. (This also
* requires that a map instance be passed to some utility methods
* that may create new nodes.)
*
* The concurrent-programming-like SSA-based coding style helps
* avoid aliasing errors amid all of the twisty pointer operations.
*/```

### 第几颗开的 - 几何分布
我们关心的是第一个 hit 点的 index 是几,可能是 1,2,3……,这所有可能的情况就构成了几何分布。我们可以根据函数算出第五发开了的概率。
进一步对几何分布做积分就可以得到,五发之内可以开了的概率。
# 统计
两者的关系可以简单解释如下,“概率”指的是,如果知道杯中有三个红球、两个蓝球,则可以算出抽出一个红球的概率为%22%20aria-hidden%3D%22true%22%3E%0A%3Cg%20transform%3D%22translate(120%2C0)%22%3E%0A%3Crect%20stroke%3D%22none%22%20width%3D%22620%22%20height%3D%2260%22%20x%3D%220%22%20y%3D%22220%22%3E%3C%2Frect%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-33%22%20x%3D%2260%22%20y%3D%22676%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-35%22%20x%3D%2260%22%20y%3D%22-687%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E):

而“统计”是“概率”的逆向操作,从杯子中摸了两次,得到一个蓝球、一个红球,求杯中有几个红球、几个蓝球:

## 先假设
## 再检验
数据科学|统计入门|频道介绍 - 工程师和小土豆的视频 - 知乎
https://www.zhihu.com/zvideo/1326478130794971136