> "The theory of probabilities is at bottom nothing but common sense reduced to calculus; **it enables us to appreciate with exactness that which accurate minds feel with a sort of instinct for which times they are unable to account**." # Summary 大数定理 和 中心极限定理 如何一起撑起统计推断? | | 步骤 | [大数定律 Law of Large Numbers, LLN](大数定律%20Law%20of%20Large%20Numbers,%20LLN.md)作用 | [中心极限定理 CLT](中心极限定理%20CLT.md)作用 | | --------------- | ------------ | ------------------------------------------------------------------------------- | ------------------------------------------ | | [参数估计](参数估计.md) | [[点估计]] | 确保估计 **收敛到真值**(一致性)|—| | [参数估计](参数估计.md) | [[区间估计]] |—| 提供**误差分布**,把“点”扩成“区间” | | [[假设检验]] | |—| 预先知道检验统计量近似正态,才能给出 p‑value | | | **渐近最优性/效率** | 依赖 LLN 的一致性作为前提 | 用 CLT 的方差下界 (Cramér–Rao, Fisher 信息) 衡量“最优” | | | | | | [《基础统计学》](《基础统计学》) 哪些统计学的书让你相见恨晚?- Psychonomist 的回答 - 知乎 https://www.zhihu.com/question/602368094/answer/3046935637 概率论与数理统计其实是两个不同的领域,但用的是同一套 [方法论](https://www.zhihu.com/search?q=%E6%96%B9%E6%B3%95%E8%AE%BA&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra=%7B%22sourceType%22%3A%22answer%22%2C%22sourceId%22%3A%223088786460%22%7D)。用一个简单的例子来阐明概率论与数理统计的区别: - 已知盒子中的红球数量和白球数量,问抓到红球的概率,这叫做概率论; - 未知盒子中红球数量和白球数量,但随机抓出了一些红球和白球且数量已知,反推盒子中的红球和白球数量,这叫做数理统计。 所以,数理统计是概率论的逆问题。数学中有很多互逆的问题,比如加法和减法互逆,乘法和 [除法互逆](https://www.zhihu.com/search?q=%E9%99%A4%E6%B3%95%E4%BA%92%E9%80%86&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra=%7B%22sourceType%22%3A%22answer%22%2C%22sourceId%22%3A%223088786460%22%7D),求导和积分互逆,这里我们又谈到了概率论与 [数理统计互逆](https://www.zhihu.com/search?q=%E6%95%B0%E7%90%86%E7%BB%9F%E8%AE%A1%E4%BA%92%E9%80%86&search_source=Entity&hybrid_search_source=Entity&hybrid_search_extra=%7B%22sourceType%22%3A%22answer%22%2C%22sourceId%22%3A%223088786460%22%7D)。所以,我们正确的学习步骤是,先把作为正问题的概率论搞懂,再去搞懂作为逆问题的数理统计。就期末复习来说,你需要首先了解这门学科的考点。以考点为纲进行复习,效率马上翻几倍。 - [两种画图方式](#%E4%B8%A4%E7%A7%8D%E7%94%BB%E5%9B%BE%E6%96%B9%E5%BC%8F) - [1. 散点图](#1.%20%E6%95%A3%E7%82%B9%E5%9B%BE) - [2. 直方图](#2.%20%E7%9B%B4%E6%96%B9%E5%9B%BE) - [对数据分布的形容](#%E5%AF%B9%E6%95%B0%E6%8D%AE%E5%88%86%E5%B8%83%E7%9A%84%E5%BD%A2%E5%AE%B9) - [条件概率到贝叶斯公式](#%E6%9D%A1%E4%BB%B6%E6%A6%82%E7%8E%87%E5%88%B0%E8%B4%9D%E5%8F%B6%E6%96%AF%E5%85%AC%E5%BC%8F) - [身高统计 - 正态分布](#%E8%BA%AB%E9%AB%98%E7%BB%9F%E8%AE%A1%20-%20%E6%AD%A3%E6%80%81%E5%88%86%E5%B8%83) - [投篮场景](#%E6%8A%95%E7%AF%AE%E5%9C%BA%E6%99%AF) - [投篮的散点图](#%E6%8A%95%E7%AF%AE%E7%9A%84%E6%95%A3%E7%82%B9%E5%9B%BE) - [投中的比投丢的 - 二项分布](#%E6%8A%95%E4%B8%AD%E7%9A%84%E6%AF%94%E6%8A%95%E4%B8%A2%E7%9A%84%20-%20%E4%BA%8C%E9%A1%B9%E5%88%86%E5%B8%83) - [泊松分布](#%E6%B3%8A%E6%9D%BE%E5%88%86%E5%B8%83) - [HashMap 的链表转红黑树阈值](#HashMap%20%E7%9A%84%E9%93%BE%E8%A1%A8%E8%BD%AC%E7%BA%A2%E9%BB%91%E6%A0%91%E9%98%88%E5%80%BC) # 概率 ## 两种画图方式 ### 1. 散点图 ![image.png|1000](https://imagehosting4picgo.oss-cn-beijing.aliyuncs.com/imagehosting/fix-dir%2Fpicgo%2Fpicgo-clipboard-images%2F2024%2F04%2F27%2F16-46-33-ad1c0b9f7fd8a4aa104604fda01356eb-20240427164632-9bf25c.png) ### 2. 直方图 直方图其实就是散点图旋转 90 度后,统计个数/绿色的粗细得到的 ![image.png|1000](https://imagehosting4picgo.oss-cn-beijing.aliyuncs.com/imagehosting/fix-dir%2Fpicgo%2Fpicgo-clipboard-images%2F2024%2F04%2F27%2F16-44-07-55817ba9b4d5c0b17ea1263c8b59be65-20240427164407-ed8632.png) ## 对数据分布的形容 均值、众数、分位数、中位数都是对数据分布情况的数字化描述 方差(图中正方形)和标准差(图中正方形的边长)是对数据整体与均值水平线的偏移程度的刻画 ![iShot_2024-04-27_15.50.32.png|1000](https://imagehosting4picgo.oss-cn-beijing.aliyuncs.com/imagehosting/fix-dir%2Fliuyishou%2Ftmp%2F2024%2F04%2F27%2F15-52-09-bedf455d67a0ea2aae37c95ec741146d-iShot_2024-04-27_15.50.32-d0fb50.png) ## 条件概率到贝叶斯公式 ![image.png|1000](https://imagehosting4picgo.oss-cn-beijing.aliyuncs.com/imagehosting/fix-dir%2Fpicgo%2Fpicgo-clipboard-images%2F2024%2F04%2F27%2F17-29-08-d067397d5d0733f14a53a270add8ef89-20240427172906-0482b5.png) ## 身高场景 - 正态分布 ## 投篮场景 ![image.png|400](https://imagehosting4picgo.oss-cn-beijing.aliyuncs.com/imagehosting/fix-dir%2Fpicgo%2Fpicgo-clipboard-images%2F2024%2F04%2F28%2F15-40-12-0b0a09f129a0874d74ed57039172d4b5-20240428154012-47260a.png) ### 投篮的散点图 ![image.png|1000](https://imagehosting4picgo.oss-cn-beijing.aliyuncs.com/imagehosting/fix-dir%2Fpicgo%2Fpicgo-clipboard-images%2F2024%2F04%2F28%2F00-35-41-89b30f183c7f24d9ee8f621678dbf063-20240428003540-e966bb.png) ![image.png|1000](https://imagehosting4picgo.oss-cn-beijing.aliyuncs.com/imagehosting/fix-dir%2Fpicgo%2Fpicgo-clipboard-images%2F2024%2F04%2F28%2F00-35-43-ff5ed89ad53968c8192c0c98d583aedb-20240428003542-f5528a.png) 横轴是样本序号 index,纵轴是观测到结果的话,散点图上应该只有两条横线,因为只有两种观测值,要么投篮投中,要么投篮不中, ### 投中的比投丢的 - 二项分布 本质上,我们好奇的是两根横线的长短之比,也就是中了几颗,不中几颗?所有的投中投丢比就构成了二项分布。二项分布的概率公式就可以帮助计算出某个场景即某种 hit 与 miss 之比出现的概率。 #### 泊松分布 泊松分布的**本质还是二项分布**,泊松分布只是用来简化二项分布计算的。就是二项分布公式的函数图像在 index 数量无穷大处的一个等价的函数表达式 #### HashMap 的链表转红黑树阈值 ```java /* * Implementation notes. * * This map usually acts as a binned (bucketed) hash table, but * when bins get too large, they are transformed into bins of * TreeNodes, each structured similarly to those in * java.util.TreeMap. Most methods try to use normal bins, but * relay to TreeNode methods when applicable (simply by checking * instanceof a node). Bins of TreeNodes may be traversed and * used like any others, but additionally support faster lookup * when overpopulated. However, since the vast majority of bins in * normal use are not overpopulated, checking for existence of * tree bins may be delayed in the course of table methods. * * Tree bins (i.e., bins whose elements are all TreeNodes) are * ordered primarily by hashCode, but in the case of ties, if two * elements are of the same "class C implements Comparable<C>", * type then their compareTo method is used for ordering. (We * conservatively check generic types via reflection to validate * this -- see method comparableClassFor). The added complexity * of tree bins is worthwhile in providing worst-case O(log n) * operations when keys either have distinct hashes or are * orderable, Thus, performance degrades gracefully under * accidental or malicious usages in which hashCode() methods * return values that are poorly distributed, as well as those in * which many keys share a hashCode, so long as they are also * Comparable. (If neither of these apply, we may waste about a * factor of two in time and space compared to taking no * precautions. But the only known cases stem from poor user * programming practices that are already so slow that this makes * little difference.) * * Because TreeNodes are about twice the size of regular nodes, we * use them only when bins contain enough nodes to warrant use * (see TREEIFY_THRESHOLD). And when they become too small (due to * removal or resizing) they are converted back to plain bins. In * usages with well-distributed user hashCodes, tree bins are * rarely used. Ideally, under random hashCodes, the frequency of * nodes in bins follows a Poisson distribution * (http://en.wikipedia.org/wiki/Poisson_distribution) with a * parameter of about 0.5 on average for the default resizing * threshold of 0.75, although with a large variance because of * resizing granularity. Ignoring variance, the expected * occurrences of list size k are (exp(-0.5) * pow(0.5, k) / * factorial(k)). The first values are: * * 0: 0.60653066 * 1: 0.30326533 * 2: 0.07581633 * 3: 0.01263606 * 4: 0.00157952 * 5: 0.00015795 * 6: 0.00001316 * 7: 0.00000094 * 8: 0.00000006 * more: less than 1 in ten million * * The root of a tree bin is normally its first node. However, * sometimes (currently only upon Iterator.remove), the root might * be elsewhere, but can be recovered following parent links * (method TreeNode.root()). * * All applicable internal methods accept a hash code as an * argument (as normally supplied from a public method), allowing * them to call each other without recomputing user hashCodes. * Most internal methods also accept a "tab" argument, that is * normally the current table, but may be a new or old one when * resizing or converting. * * When bin lists are treeified, split, or untreeified, we keep * them in the same relative access/traversal order (i.e., field * Node.next) to better preserve locality, and to slightly * simplify handling of splits and traversals that invoke * iterator.remove. When using comparators on insertion, to keep a * total ordering (or as close as is required here) across * rebalancings, we compare classes and identityHashCodes as * tie-breakers. * * The use and transitions among plain vs tree modes is * complicated by the existence of subclass LinkedHashMap. See * below for hook methods defined to be invoked upon insertion, * removal and access that allow LinkedHashMap internals to * otherwise remain independent of these mechanics. (This also * requires that a map instance be passed to some utility methods * that may create new nodes.) * * The concurrent-programming-like SSA-based coding style helps * avoid aliasing errors amid all of the twisty pointer operations. */``` ![image.png|1000](https://imagehosting4picgo.oss-cn-beijing.aliyuncs.com/imagehosting/fix-dir%2Fpicgo%2Fpicgo-clipboard-images%2F2024%2F04%2F28%2F15-34-21-89fd8ac33530c39182a4a8e948f50a8f-20240428153421-aa47c4.png) ### 第几颗开的 - 几何分布 我们关心的是第一个 hit 点的 index 是几,可能是 1,2,3……,这所有可能的情况就构成了几何分布。我们可以根据函数算出第五发开了的概率。 进一步对几何分布做积分就可以得到,五发之内可以开了的概率。 # 统计 两者的关系可以简单解释如下,“概率”指的是,如果知道杯中有三个红球、两个蓝球,则可以算出抽出一个红球的概率为![\frac{3}{5}](data:image/svg+xml;utf8,%3Csvg%20xmlns%3Axlink%3D%22http%3A%2F%2Fwww.w3.org%2F1999%2Fxlink%22%20width%3D%221.999ex%22%20height%3D%225.176ex%22%20style%3D%22font-size%3A14px%3Bvertical-align%3A%20-1.838ex%3B%22%20viewBox%3D%220%20-1437.2%20860.5%202228.5%22%20role%3D%22img%22%20focusable%3D%22false%22%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20aria-labelledby%3D%22MathJax-SVG-1-Title%22%3E%0A%3Ctitle%20id%3D%22MathJax-SVG-1-Title%22%3E%5Cfrac%7B3%7D%7B5%7D%3C%2Ftitle%3E%0A%3Cdefs%20aria-hidden%3D%22true%22%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-33%22%20d%3D%22M127%20463Q100%20463%2085%20480T69%20524Q69%20579%20117%20622T233%20665Q268%20665%20277%20664Q351%20652%20390%20611T430%20522Q430%20470%20396%20421T302%20350L299%20348Q299%20347%20308%20345T337%20336T375%20315Q457%20262%20457%20175Q457%2096%20395%2037T238%20-22Q158%20-22%20100%2021T42%20130Q42%20158%2060%20175T105%20193Q133%20193%20151%20175T169%20130Q169%20119%20166%20110T159%2094T148%2082T136%2074T126%2070T118%2067L114%2066Q165%2021%20238%2021Q293%2021%20321%2074Q338%20107%20338%20175V195Q338%20290%20274%20322Q259%20328%20213%20329L171%20330L168%20332Q166%20335%20166%20348Q166%20366%20174%20366Q202%20366%20232%20371Q266%20376%20294%20413T322%20525V533Q322%20590%20287%20612Q265%20626%20240%20626Q208%20626%20181%20615T143%20592T132%20580H135Q138%20579%20143%20578T153%20573T165%20566T175%20555T183%20540T186%20520Q186%20498%20172%20481T127%20463Z%22%3E%3C%2Fpath%3E%0A%3Cpath%20stroke-width%3D%221%22%20id%3D%22E1-MJMAIN-35%22%20d%3D%22M164%20157Q164%20133%20148%20117T109%20101H102Q148%2022%20224%2022Q294%2022%20326%2082Q345%20115%20345%20210Q345%20313%20318%20349Q292%20382%20260%20382H254Q176%20382%20136%20314Q132%20307%20129%20306T114%20304Q97%20304%2095%20310Q93%20314%2093%20485V614Q93%20664%2098%20664Q100%20666%20102%20666Q103%20666%20123%20658T178%20642T253%20634Q324%20634%20389%20662Q397%20666%20402%20666Q410%20666%20410%20648V635Q328%20538%20205%20538Q174%20538%20149%20544L139%20546V374Q158%20388%20169%20396T205%20412T256%20420Q337%20420%20393%20355T449%20201Q449%20109%20385%2044T229%20-22Q148%20-22%2099%2032T50%20154Q50%20178%2061%20192T84%20210T107%20214Q132%20214%20148%20197T164%20157Z%22%3E%3C%2Fpath%3E%0A%3C%2Fdefs%3E%0A%3Cg%20stroke%3D%22currentColor%22%20fill%3D%22currentColor%22%20stroke-width%3D%220%22%20transform%3D%22matrix(1%200%200%20-1%200%200)%22%20aria-hidden%3D%22true%22%3E%0A%3Cg%20transform%3D%22translate(120%2C0)%22%3E%0A%3Crect%20stroke%3D%22none%22%20width%3D%22620%22%20height%3D%2260%22%20x%3D%220%22%20y%3D%22220%22%3E%3C%2Frect%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-33%22%20x%3D%2260%22%20y%3D%22676%22%3E%3C%2Fuse%3E%0A%20%3Cuse%20xlink%3Ahref%3D%22%23E1-MJMAIN-35%22%20x%3D%2260%22%20y%3D%22-687%22%3E%3C%2Fuse%3E%0A%3C%2Fg%3E%0A%3C%2Fg%3E%0A%3C%2Fsvg%3E): ![马同学高等数学]() 而“统计”是“概率”的逆向操作,从杯子中摸了两次,得到一个蓝球、一个红球,求杯中有几个红球、几个蓝球: ![马同学高等数学]() ## 先假设 ## 再检验 数据科学|统计入门|频道介绍 - 工程师和小土豆的视频 - 知乎 https://www.zhihu.com/zvideo/1326478130794971136