<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Python]]></title><description><![CDATA[Python]]></description><link>https://www.actuaryunion.com/category/22</link><generator>RSS for Node</generator><lastBuildDate>Wed, 17 Jun 2026 07:13:48 GMT</lastBuildDate><atom:link href="https://www.actuaryunion.com/category/22.rss" rel="self" type="application/rss+xml"/><pubDate>Fri, 19 Apr 2024 08:58:37 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[最简单的tkinter GUI应用场景]]></title><description><![CDATA[<p dir="auto">先前自己做一些内部开发程序的封装时，一直采用input的语句让用户输入文件路径：</p>
<pre><code>file_path = str(input('请输入文件路径：'))
</code></pre>
<p dir="auto">最近研究GUI时发现了一个对用户更友好的操作，可以利用tkinter库来打开一个窗口，直接选择文件，具体代码及实现效果如下（其中，select_file()是选择文件，select_dierectory()是选择文件夹）</p>
<pre><code>import tkinter as tk
from tkinter import filedialog

def select_file():
    root =tk.Tk()
    root.withdraw() #隐藏主窗口
    file_path = filedialog.askopenfilename()

    return file_path

def select_directory():
    root =tk.Tk()
    root.withdraw() #隐藏主窗口
    dir_path = filedialog.askdirectory()

    return dir_path

if __name__=='__main__':
    select_file()
</code></pre>
<p dir="auto"><img src="/assets/uploads/files/1713517029926-aeefbf95-3d04-432c-a322-a22f1bb7c057-image.png" alt="aeefbf95-3d04-432c-a322-a22f1bb7c057-image.png" class="img-responsive img-markdown" /></p>
]]></description><link>https://www.actuaryunion.com/topic/271/最简单的tkinter-gui应用场景</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/271/最简单的tkinter-gui应用场景</guid><dc:creator><![CDATA[Brad]]></dc:creator><pubDate>Fri, 19 Apr 2024 08:58:37 GMT</pubDate></item><item><title><![CDATA[Python 中加号的有趣行为]]></title><description><![CDATA[<p dir="auto">今天发现了Python一个有趣的行为：<br />
<img src="/assets/uploads/files/1655310348625-21ee2973-8444-4c46-a623-d65a86d4cd93-image.png" alt="21ee2973-8444-4c46-a623-d65a86d4cd93-image.png" class="img-responsive img-markdown" /><br />
似乎这里的+号被误读了。这里运行不会报错，所以有时会Debug一会才会发现这个问题。</p>
<p dir="auto">StackOverFlow关于这个有个很好的回答：<br />
<a href="https://stackoverflow.com/questions/53162/how-can-i-do-a-line-break-line-continuation-in-python" rel="nofollow">https://stackoverflow.com/questions/53162/how-can-i-do-a-line-break-line-continuation-in-python</a></p>
]]></description><link>https://www.actuaryunion.com/topic/225/python-中加号的有趣行为</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/225/python-中加号的有趣行为</guid><dc:creator><![CDATA[Mengkelyu]]></dc:creator><pubDate>Wed, 15 Jun 2022 16:26:24 GMT</pubDate></item><item><title><![CDATA[python常见乱码类型总结]]></title><description><![CDATA[<p dir="auto">最近python读取文件碰到好几次文件乱码的问题，想着集中解决一下，拾人牙慧汇总了一下大神们的解决办法：<br />
1、关于几种乱码产生的原因：<br />
<img src="/assets/uploads/files/1654611083041-0607.jpg" alt="0607.jpg" class="img-responsive img-markdown" /></p>
<p dir="auto">来源：xuan196  <a href="https://blog.csdn.net/xuan196/article/details/115127416" rel="nofollow">https://blog.csdn.net/xuan196/article/details/115127416</a></p>
<p dir="auto">2、python处理中文乱码的问题：<br />
2.1  将要处理的乱码对象设置 encoding = utf-8''</p>
<pre><code>    response = requests.get(url=url, headers=headers)
    response.encoding = 'utf-8'
</code></pre>
<p dir="auto">2.2  先设置encode的编码为iso-8859-1，再进行encoding的utf-8的设置</p>
<pre><code> # 通用处理中文乱码的解决方案
 img_name = img_name.encode('iso-8859-1').decode('gbk')
</code></pre>
<p dir="auto">来源：Ctrl精 <a href="https://blog.csdn.net/qq_43468607/article/details/116154254?spm=1001.2101.3001.6650.6&amp;utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7ERate-6-116154254-blog-120749614.pc_relevant_paycolumn_v3&amp;depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7ERate-6-116154254-blog-120749614.pc_relevant_paycolumn_v3&amp;utm_relevant_index=10" rel="nofollow">https://blog.csdn.net/qq_43468607/article/details/116154254?spm=1001.2101.3001.6650.6&amp;utm_medium=distribute.pc_relevant.none-task-blog-2~default~BlogCommendFromBaidu~Rate-6-116154254-blog-120749614.pc_relevant_paycolumn_v3&amp;depth_1-utm_source=distribute.pc_relevant.none-task-blog-2~default~BlogCommendFromBaidu~Rate-6-116154254-blog-120749614.pc_relevant_paycolumn_v3&amp;utm_relevant_index=10</a></p>
<p dir="auto">通过以上两种方式我解决了最近遇到的所有了乱码问题，也感谢两篇文章的作者，分享出来共勉，侵删。</p>
]]></description><link>https://www.actuaryunion.com/topic/223/python常见乱码类型总结</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/223/python常见乱码类型总结</guid><dc:creator><![CDATA[chilli_drop]]></dc:creator><pubDate>Tue, 07 Jun 2022 14:48:02 GMT</pubDate></item><item><title><![CDATA[[求助] Altair 如何显示部分X轴标签]]></title><description><![CDATA[Reference:
https://altair-viz.github.io/user_guide/generated/core/altair.Axis.html
https://vega.github.io/vega/docs/expressions/
]]></description><link>https://www.actuaryunion.com/topic/222/求助-altair-如何显示部分x轴标签</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/222/求助-altair-如何显示部分x轴标签</guid><dc:creator><![CDATA[Mengkelyu]]></dc:creator><pubDate>Tue, 07 Jun 2022 14:33:47 GMT</pubDate></item><item><title><![CDATA[PlayWright 爬虫实战篇之Discord]]></title><description><![CDATA[然后成功实现了自己发出有"mengke"字样的消息时会发出响声。目前先研究到这里；大家有什么思路也可以提出来；
]]></description><link>https://www.actuaryunion.com/topic/221/playwright-爬虫实战篇之discord</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/221/playwright-爬虫实战篇之discord</guid><dc:creator><![CDATA[Mengkelyu]]></dc:creator><pubDate>Sat, 04 Jun 2022 11:41:27 GMT</pubDate></item><item><title><![CDATA[[手把手教你如何成为资本家] 微信自动催更程序Python]]></title><description><![CDATA[代码在这里：compressed code.zip
如果出现这个报错 说明搜索框可能没有完全露出来。
]]></description><link>https://www.actuaryunion.com/topic/219/手把手教你如何成为资本家-微信自动催更程序python</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/219/手把手教你如何成为资本家-微信自动催更程序python</guid><dc:creator><![CDATA[Mengkelyu]]></dc:creator><pubDate>Sat, 28 May 2022 10:20:40 GMT</pubDate></item><item><title><![CDATA[经典教材：Python编程_1ed_Eric Matthes]]></title><description><![CDATA[<p dir="auto">经典教材：Python编程_1ed_Eric Matthes</p>
<p dir="auto"><a href="/assets/uploads/files/1645884805434-python%E7%BC%96%E7%A8%8B_1ed_eric-matthes.pdf">Python编程_1ed_Eric Matthes.pdf</a></p>
]]></description><link>https://www.actuaryunion.com/topic/185/经典教材-python编程_1ed_eric-matthes</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/185/经典教材-python编程_1ed_eric-matthes</guid><dc:creator><![CDATA[Howie Jie]]></dc:creator><pubDate>Sat, 26 Feb 2022 14:13:34 GMT</pubDate></item><item><title><![CDATA[如何用Python自动化你的PPT制作]]></title><description><![CDATA[群里有小伙伴说如果有用R的话可以用xaringan这个包，比较方便，参考https://slides.yihui.org/xaringan/zh-CN.html#1
]]></description><link>https://www.actuaryunion.com/topic/119/如何用python自动化你的ppt制作</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/119/如何用python自动化你的ppt制作</guid><dc:creator><![CDATA[Mengkelyu]]></dc:creator><pubDate>Wed, 30 Dec 2020 07:51:04 GMT</pubDate></item><item><title><![CDATA[封装py文件为exe文件：攻略及心得体会]]></title><description><![CDATA[封装完成之后，我发现如果把一个程序都封装到一个file里面，会导致程序启动很慢。
然后我在网上找到了这个答案

如果把程序封装到一个文件，在文件执行的时候还需要有一个unpack的操作。
因此，用 --onedir 把程序封装到一个文件夹里之后可以有效加快运行
pyinstaller --onedir -w pydocument.py
]]></description><link>https://www.actuaryunion.com/topic/100/封装py文件为exe文件-攻略及心得体会</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/100/封装py文件为exe文件-攻略及心得体会</guid><dc:creator><![CDATA[Mengkelyu]]></dc:creator><pubDate>Sun, 01 Nov 2020 05:53:46 GMT</pubDate></item><item><title><![CDATA[苦于不知道loop或apply运行的进度？Python Progress Bar来啦！]]></title><description><![CDATA[tqdm用法非常简单，只需在平常循环的对象上套上tqdm函数，就可以看到运行进度啦！
from tqdm import tqdm

for i in tqdm(range(100)):
    i  = i * 2

如果你用的是Jupyter notebook，建议用这个notebook.tqdm函数，或者auto.tqdm
from tqdm.notebook import tqdm
# from tqdm.auto import tqdm
for i in tqdm(range(100)):
    i  = i * 2

这个函数画出的Progress Bar更好看
如果你用的是Pandas apply，也可以用tqdm包显示运行进度哦
代码来源：https://stackoverflow.com/questions/18603270/progress-indicator-during-pandas-operations
import pandas as pd
import numpy as np
from tqdm import tqdm
# from tqdm.auto import tqdm  # for notebooks

df = pd.DataFrame(np.random.randint(0, int(1e8), (10000, 1000)))

# Create and register a new `tqdm` instance with `pandas`
# (can use tqdm_gui, optional kwargs, etc.)
tqdm.pandas()

# Now you can use `progress_apply` instead of `apply`
df.groupby(0).progress_apply(lambda x: x**2)

]]></description><link>https://www.actuaryunion.com/topic/118/苦于不知道loop或apply运行的进度-python-progress-bar来啦</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/118/苦于不知道loop或apply运行的进度-python-progress-bar来啦</guid><dc:creator><![CDATA[Mengkelyu]]></dc:creator><pubDate>Sat, 24 Oct 2020 11:21:48 GMT</pubDate></item><item><title><![CDATA[使用Lifelines包进行Cox模型拟合]]></title><description><![CDATA[核心代码如下：
# platform是数据集的名字
# 查看缺失值，如果有，需要填充
platform.isna().sum()

# 简单处理缺失值
platform.fillna(0, inplace = True)

# 拟合数据
from lifelines import CoxPHFitter
cph = CoxPHFitter()
cph.fit(platform, duration_col='Survival_years', event_col='Survival',formula = "factor1 + factor2")

# 查看结果
cph.print_summary()

]]></description><link>https://www.actuaryunion.com/topic/117/使用lifelines包进行cox模型拟合</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/117/使用lifelines包进行cox模型拟合</guid><dc:creator><![CDATA[Mengkelyu]]></dc:creator><pubDate>Sun, 27 Sep 2020 11:40:54 GMT</pubDate></item><item><title><![CDATA[一个用Python类计算车险保费的例子]]></title><description><![CDATA[代码和数据附上
compressed_file.zip
]]></description><link>https://www.actuaryunion.com/topic/106/一个用python类计算车险保费的例子</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/106/一个用python类计算车险保费的例子</guid><dc:creator><![CDATA[Mengkelyu]]></dc:creator><pubDate>Sun, 02 Aug 2020 07:39:30 GMT</pubDate></item><item><title><![CDATA[如何用pandas 直接读取excel]]></title><description><![CDATA[代码如下
import pandas as pd
tables = pd.read_excel("./premium_preparation.xlsm", sheet_name = [0,1])
rate_table = tables[0]
short_rate_table = tables[1]

]]></description><link>https://www.actuaryunion.com/topic/105/如何用pandas-直接读取excel</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/105/如何用pandas-直接读取excel</guid><dc:creator><![CDATA[Mengkelyu]]></dc:creator><pubDate>Sun, 02 Aug 2020 05:50:37 GMT</pubDate></item><item><title><![CDATA[Python基础知识整理]]></title><description><![CDATA[<p dir="auto">早些时候整理的python基础知识，很多地方格式没有调整，希望大家见谅啦~</p>
<h3>change the working directory of anaconda</h3>
<p dir="auto">In the terminal, run</p>
<pre><code>jupyter notebook --generate-config
</code></pre>
<p dir="auto">Modify the config file and restart Anaconda Navigator:</p>
<p dir="auto">Open the jupyter_notebook_config.py file in any suitable text editor and modify the “c.NotebookApp.notebook_dir” entry to point to the desired working directory. You will have to modify the “\” to “\” in your windows file path. Make sure to uncomment the line by removing the “#”.</p>
<p dir="auto">Save the file and restart the Anaconda Navigator.</p>
<h3>get current working directory</h3>
<pre><code class="language-python">os.getcwd()
</code></pre>
<h3>enumerate</h3>
<ul>
<li>loop through the items</li>
</ul>
<pre><code class="language-python">lst = ["app", "banana", "gig"]
for thing in lst:
    print(thing)
</code></pre>
<ul>
<li>index + item: use enumerate</li>
</ul>
<pre><code class="language-python">lst = ["app", "banana", "gig"]
for idx, thing in enumerate(lst):
    print(idx)
    print(thing)
</code></pre>
<h3>How to find a subset of a list</h3>
<pre><code class="language-python">sublist = [i for i in list if i &gt; x]
</code></pre>
<h3>The summation of list</h3>
<p dir="auto">Similar to union_all</p>
<pre><code class="language-python">a = [1,2,3]
b = [3,3,4]
a+b
# [1, 2, 3, 3, 3, 4]
</code></pre>
<h3>Differences of loc and iloc and []</h3>
<ul>
<li>
<p dir="auto">Difference between df['col_name'].values and df&lsqb;&lsqb;'col_name'&rsqb;&rsqb;.values. The former gives a 1d array and the latter gives a 2d array</p>
</li>
<li>
<p dir="auto">loc[] is the same as [] in most of the times!!! But it is better to call it explicitly</p>
</li>
<li>
<p dir="auto">Avoid chain indexing!!! like Ax['s']['as']. It can be replaced by .loc['as','s']</p>
</li>
<li>
<p dir="auto">The way to index on column name and row number without chain indexing</p>
</li>
</ul>
<pre><code class="language-python">df.loc[df.index[0], 'NAME']
# or
df.iloc[0, df.columns.get_loc("a")]
</code></pre>
<ul>
<li>loc is label-based, which means that we have to specify the name of the rows and columns that we need to filter out.
<ul>
<li>For example, let’s say we search for the rows whose index is 1, 2 or 100. We will not get the first, second or the hundredth row here. Instead, we will get the results only if the name of any index is 1, 2 or 100.</li>
</ul>
</li>
</ul>
<pre><code class="language-python"># select all rows with a condition
data.loc[data.age &gt;= 15]
# select with multiple conditions
data.loc[(data.age &gt;= 12) &amp; (data.gender == 'M')]
# Select a range of rows using loc
#slice
data.loc[1:3]
# Using loc, we can also slice the Pandas dataframe over a range of indices. If the indices are not in the sorted order, it will select only the rows with index 1 and 3
# Select only required columns with a condition
data.loc[(data.age &gt;= 12), ['city', 'gender'&rsqb;&rsqb;
# update a column with condition
data.loc[(data.age &gt;= 12), ['section'&rsqb;&rsqb; = 'M'
# update multiple columns with condition
data.loc[(data.age &gt;= 20), ['section', 'city'&rsqb;&rsqb; = ['S','Pune']

# select a column
data.loc&lsqb;&lsqb;'col_name'&rsqb;&rsqb;

# select index + column
data.loc[data.age &gt;= 12,'col_name']
</code></pre>
<ul>
<li>On the other hand, iloc is integer index-based. So here, we have to specify rows and columns by their integer index.</li>
</ul>
<pre><code class="language-python"># select rows with indexes
data.iloc&lsqb;&lsqb;0,2&rsqb;&rsqb;
# select rows with particular indexes and particular columns
data.iloc&lsqb;&lsqb;0,2],[1,3&rsqb;&rsqb;
# select a range of rows
data.iloc[1:3]
# select a range of rows and columns
data.iloc[1:3,2:4]
</code></pre>
<h3>How to slice series</h3>
<pre><code class="language-python"># df_temp is of pandas.series object
df_temp = df_all.isnull().sum(axis=0)
df_temp[df_temp&gt;0]
</code></pre>
<ul>
<li>Select a particular column</li>
</ul>
<pre><code class="language-python">df['label']
</code></pre>
<h3>Basic picture</h3>
<pre><code class="language-python"># packages
import matplotlib.pyplot as plt
%matplotlib inline

df_train['label'].value_counts().plot(kind='bar')
# create fig in each sub graphs
fig = plt.figure(figsize=(18 ,10))

for idx, row in enumerate(images):
    ax = fig.add_subplot(2,3,idx + 1)
    ax.set_xticks([])
    ax.set_yticks([])
    pixels = df_train.iloc[row, 1:786].values.reshape((28,28))
    ax.imshow(pixels, cmap="gray")
    ax.set_title(df_train.iloc[row]['label'], fontsize = 24)
</code></pre>
<h3>Difference index, array, list</h3>
<h4>Array / List</h4>
<p dir="auto">Lists and arrays are used in Python to store data(any data type- strings, integers etc), both can be indexed and iterated also. Difference between lists and arrays are the functions that you can perform on them like for example when you want to divide an array by 4, the result will be printed on request but in case of a list, python will throw an error message.</p>
<h4>index</h4>
<p dir="auto">Index, on the other hand, is immutable<br />
Index: Immutable ndarray implementing an ordered, sliceable set.</p>
<p dir="auto">Properties</p>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>Index.values</th>
<th>Return an array representing the data in the Index.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Index.is_monotonic</td>
<td>Alias for is_monotonic_increasing.</td>
</tr>
<tr>
<td>Index.is_monotonic_increasing</td>
<td>Return if the index is monotonic increasing (only equal or increasing) values.</td>
</tr>
<tr>
<td>Index.is_monotonic_decreasing</td>
<td>Return if the index is monotonic decreasing (only equal or decreasing) values.</td>
</tr>
<tr>
<td>Index.is_unique</td>
<td>Return if the index has unique values.</td>
</tr>
</tbody>
</table>
<h3>dataframe.sum</h3>
<p dir="auto">dataframe.sum(axis=0)<br />
按照行对每列进行sum</p>
<h4>drop / dropna / isna /fillna</h4>
<ul>
<li>index of missing values in a particular column</li>
</ul>
<pre><code class="language-python">idx_missing = df[column].isna()
</code></pre>
<ul>
<li>find the rows withour missing values</li>
</ul>
<pre><code class="language-python">df[-idx_missing]
df.loc[-idx_missing]
</code></pre>
<ul>
<li>fill na with value</li>
</ul>
<pre><code class="language-python"># fill na with No College
# inplace means apply the changes to the original dataframe and no output
# will be produced
nba["College"].fillna("No College", inplace = True) 
# fill na with mean
nba["College"].fillna(np.mean(nba["College"]), inplace = True) 
# method : Method is used if user doesn’t pass any value. Pandas has different methods bfill/ ffill which fills the place with value in the Previous/Back respectively.
nba["College"].fillna(method = "ffill", inplace = True)
# 用空值前面的值去填充它

</code></pre>
<h3>Matrix operation</h3>
<p dir="auto">In Python we can solve the different matrix manipulations and operations. Numpy Module provides different methods for matrix operations.</p>
<ul>
<li>
<p dir="auto">add() − add elements of two matrices.</p>
</li>
<li>
<p dir="auto">subtract() − subtract elements of two matrices.</p>
</li>
<li>
<p dir="auto">divide() − divide elements of two matrices.</p>
</li>
<li>
<p dir="auto">multiply() − multiply elements of two matrices.</p>
</li>
<li>
<p dir="auto">dot() − It performs matrix multiplication, does not element wise multiplication.</p>
</li>
<li>
<p dir="auto">sqrt() − square root of each element of matrix.</p>
</li>
<li>
<p dir="auto">sum(x,axis) − add to all the elements in matrix. Second argument is optional, it is used when we want to compute the column sum if axis is 0 and row sum if axis is 1.</p>
</li>
<li>
<p dir="auto">“T” − It performs transpose of the specified matrix.</p>
</li>
</ul>
<pre><code class="language-python">import numpy
# Two matrices are initialized by value
x = numpy.array(&lsqb;&lsqb;1, 2], [4, 5&rsqb;&rsqb;)
y = numpy.array(&lsqb;&lsqb;7, 8], [9, 10&rsqb;&rsqb;)
#  add()is used to add matrices
print ("Addition of two matrices: ")
print (numpy.add(x,y))
# subtract()is used to subtract matrices
print ("Subtraction of two matrices : ")
print (numpy.subtract(x,y))
# divide()is used to divide matrices
print ("Matrix Division : ")
print (numpy.divide(x,y))
print ("Multiplication of two matrices: ")
print (numpy.multiply(x,y))
print ("The product of two matrices : ")
print (numpy.dot(x,y))
print ("square root is : ")
print (numpy.sqrt(x))
print ("The summation of elements : ")
print (numpy.sum(y))
print ("The column wise summation  : ")
print (numpy.sum(y,axis=0))
print ("The row wise summation: ")
print (numpy.sum(y,axis=1))
# using "T" to transpose the matrix
print ("Matrix transposition : ")
print (x.T)
</code></pre>
<h3>lambda functions</h3>
<pre><code class="language-python">x = lambda a: a+1
x = lambda a, b : a * b

# Apply lambda function in dataframe
df['Percent Growth'].apply(lambda x: x.replace('%', '')).astype('float')
</code></pre>
<h3>reshape numpy array</h3>
<p dir="auto">numpy allow us to give one of new shape parameter as -1 (eg: (2,-1) or (-1,3) but not (-1, -1)). It simply means that it is an unknown dimension and we want numpy to figure it out. And numpy will figure this by looking at the 'length of the array and remaining dimensions' and making sure it satisfies the above mentioned criteria</p>
<h3>String formats</h3>
<p dir="auto">The format() method formats the specified value(s) and insert them inside the string's placeholder.</p>
<p dir="auto">The placeholder is defined using curly brackets: {}. Read more about the placeholders in the Placeholder section below.</p>
<p dir="auto">The format() method returns the formatted string.</p>
<pre><code class="language-python">txt1 = "My name is {fname}, I'am {age}".format(fname = "John", age = 36)
txt2 = "My name is {0}, I'am {1}".format("John",36)
txt3 = "My name is {}, I'am {}".format("John",36)
</code></pre>
<h4>Inside the placeholders you can add a formatting type to format the result</h4>
<ul>
<li><code>:&lt;</code>		Left aligns the result (within the available space)</li>
<li><code>:&gt;</code>		Right aligns the result (within the available space)</li>
<li><code>:^</code>		Center aligns the result (within the available space)</li>
<li><code>:=</code>		Places the sign to the left most position</li>
<li><code>:+</code>		Use a plus sign to indicate if the result is positive or negative</li>
<li><code>:-</code>		Use a minus sign for negative values only</li>
<li><code>:</code> 		Use a space to insert an extra space before positive numbers (and a minus sign befor negative numbers)</li>
<li><code>:,</code>		Use a comma as a thousand separator</li>
<li><code>:_</code>		Use a underscore as a thousand separator</li>
<li><code>:b</code>		Binary format</li>
<li><code>:c</code>		Converts the value into the corresponding unicode character</li>
<li><code>:d</code>		Decimal format</li>
<li><code>:e</code>		Scientific format, with a lower case e</li>
<li><code>:E</code>		Scientific format, with an upper case E</li>
<li><code>:f</code>		Fix point number format :.2f means 2 digits are preserved</li>
<li><code>:F</code>		Fix point number format, in uppercase format (show inf and nan as INF and NAN)</li>
<li><code>:g</code>		General format</li>
<li><code>:G</code>		General format (using a upper case E for scientific notations)</li>
<li><code>:o</code>		Octal format</li>
<li><code>:x</code>		Hex format, lower case</li>
<li><code>:X</code>		Hex format, upper case</li>
<li><code>:n</code>		Number format</li>
<li><code>:%</code>		Percentage format</li>
</ul>
<h3>Open a file</h3>
<p dir="auto">The available modes are:</p>
<table class="table table-bordered table-striped">
<thead>
<tr>
<th>Character</th>
<th>String</th>
</tr>
</thead>
<tbody>
<tr>
<td>'r'</td>
<td>open for reading (default)</td>
</tr>
<tr>
<td>'w'</td>
<td>open for writing, truncating the file first</td>
</tr>
<tr>
<td>'x'</td>
<td>open for exclusive creation, failing if the file already exists</td>
</tr>
<tr>
<td>'a'</td>
<td>open for writing, appending to the end of the file if it exists</td>
</tr>
<tr>
<td>'b'</td>
<td>binary model</td>
</tr>
<tr>
<td>'t'</td>
<td>text mode (default)</td>
</tr>
<tr>
<td>'+'</td>
<td>open for updating (reading and writing)</td>
</tr>
</tbody>
</table>
<p dir="auto">The default mode is 'r' (open for reading text, synonym of 'rt'). Modes 'w+' and 'w+b' open and truncate the file (先清空). Modes 'r+' and 'r+b' open the file with no truncation.</p>
<p dir="auto">As mentioned in the Overview, Python distinguishes between binary and text I/O. Files opened in binary mode (including 'b' in the mode argument) return contents as bytes objects without any decoding. In text mode (the default, or when 't' is included in the mode argument), the contents of the file are returned as str, the bytes having been first decoded using a platform-dependent encoding or using the specified encoding if given.</p>
<h3>flatten</h3>
<p dir="auto">numpy.ndarray.flatten() function</p>
<p dir="auto">The flatten() function is used to get a copy of an given array collapsed into one dimension.</p>
<p dir="auto">‘C’ means to flatten in row-major (C-style) order. ‘F’ means to flatten in column-major (Fortran- style) order. ‘A’ means to flatten in column-major order if a is Fortran contiguous in memory, row-major order otherwise. ‘K’ means to flatten a in the order the elements occur in memory. The default is ‘C’.</p>
<pre><code class="language-python">ndarray.flatten(order='C')
</code></pre>
<h3>underscore in python</h3>
<p dir="auto">Underscore _ is considered as "I don't Care" or "Throwaway" variable in Python</p>
<p dir="auto">The underscore _ is also used for ignoring the specific values. If you don’t need the specific values or the values are not used, just assign the values to underscore.</p>
<pre><code class="language-python">x, _, y = (1, 2, 3)

&gt;&gt;&gt; x
1

&gt;&gt;&gt; y 
3
</code></pre>
<h3>.copy()</h3>
<pre><code class="language-python">df_copy = df_all
</code></pre>
<p dir="auto">df_copy和df_all在这里会是联动的，只是称呼变了，就像vba里面的set一样</p>
<pre><code class="language-python">df_copy = df_all.pd.copy()
</code></pre>
<p dir="auto">创造了一个新的object，两者是不联动的</p>
<h3>Flatten a list</h3>
<p dir="auto">Given a list of l</p>
<pre><code class="language-python">flat_list = [item for sublist in l for item in sublist]
</code></pre>
<h3>in in pandas</h3>
<p dir="auto">data.isin([])</p>
]]></description><link>https://www.actuaryunion.com/topic/98/python基础知识整理</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/98/python基础知识整理</guid><dc:creator><![CDATA[Mengkelyu]]></dc:creator><pubDate>Mon, 22 Jun 2020 22:19:53 GMT</pubDate></item><item><title><![CDATA[财险数据交互式可视化——运用Python的Bokeh包]]></title><description><![CDATA[<h2>导引</h2>
<p dir="auto">继 Alonso 上篇<a href="https://mp.weixin.qq.com/s/AST3iupnJsg0TLao1E5sYg" rel="nofollow">用Python分析财险数据——菜鸟向</a>，我对同样的数据用 Bokeh server 进行了可视化。</p>
<p dir="auto">Bokeh简单介绍：Bokeh 是 Python 的一个制作交互式可视化工具的包，R 中也有相应的包叫做 shiny (<a href="https://shiny.rstudio.com/" rel="nofollow">https://shiny.rstudio.com/</a>)。Boken 目前对中文的支持不太友好，但本文我们将用 JS 将网页语言改变为中文。Boken 有两种用法：</p>
<ul>
<li>第一种是不利用 Bokeh server，这种情况下能做出好看的交互图，实现拖曳，放大，鼠标悬浮标签等功能。最后能够生成静态的HTML文件。</li>
<li>第二种是利用 Bokeh server，做一个 Web application。这种情况下能实现数据筛选调用等更多功能。一般使用 Flask + Bokeh，把 Bokeh 放置于 Flask application 里面。我们的这个例子中没有使用 FLask，而用了一个默认的 HTML 模板，叫做 Jinja，很多可以修改的功能被限制了。</li>
</ul>
<p dir="auto">本文介绍的是第二种，Bokeh server 的 Web application 应用示例，代码基于 Bokeh Gallery 里面的两个 sample。一个是 movie，一个是 crossfilter，链接见文末的参考文献。</p>
<p dir="auto">先来示范一下效果：</p>
<ul>
<li>筛选数据功能：</li>
</ul>
<p dir="auto"><img src="/assets/uploads/files/1598597017417-1.gif" alt="1.gif" class="img-responsive img-markdown" /></p>
<ul>
<li>拖曳，选择，数据标签功能：</li>
</ul>
<p dir="auto"><img src="/assets/uploads/files/1598597041281-2.gif" alt="2.gif" class="img-responsive img-markdown" /></p>
<ul>
<li>通过拖曳点的方式修改数据的功能：</li>
</ul>
<p dir="auto"><img src="/assets/uploads/files/1598597055630-3.gif" alt="3.gif" class="img-responsive img-markdown" /></p>
<p dir="auto">该交互式图表目前 host 于http://49.234.103.189:5006/test 这个网页中。</p>
<h2>步骤</h2>
<h3>安装 Bokeh</h3>
<p dir="auto">pure python 用户打开命令行：</p>
<pre><code>pip install bokeh
</code></pre>
<p dir="auto">conda 用户：</p>
<pre><code>conda install bokeh
</code></pre>
<h3>文件树</h3>
<p dir="auto">我们需要的文件树大概是这样的结构：</p>
<p dir="auto"><img src="https://imgkr.cn-bj.ufileos.com/51fb779a-8a77-41a2-abf3-24c2fdac1743.png" alt class="img-responsive img-markdown" /></p>
<p dir="auto">app 文件夹下有三个文件：一个是 <a href="http://main.py" rel="nofollow">main.py</a>，是我们的 python 主文件；另一个是 templates 文件夹，里面放 index.html，是我们对于基本 html 框架的补充；还有一个是 lidata.csv，是我们的数据源文件。</p>
<h3>分析数据</h3>
<p dir="auto">我们要根据公司，险种，险别来进行数据筛选，因此，我们首先要得到这几列有哪些情况。</p>
<pre><code class="language-python"># lidata就是Alonso的数据集
df_all = pd.read_csv(r'./app/data/lidata.CSV', header = 0)
# 计算ULR
df_all['ULR'] = df_all['UL'] / df_all['EP']
# 加入all是为了能够选择所有情况
unique_company = ["All"] + df_all['公司'].unique().tolist()
unique_business = ["All"] + df_all['险种'].unique().tolist() 
unique_product = ["All"] + df_all['险别'].unique().tolist() 
</code></pre>
<p dir="auto">需要对不同险别展示不同颜色，代码如下</p>
<pre><code class="language-python">color = pl.mpl['Plasma'][len(unique_product)]
#这里Plasma是一个Bokeh自带的调色盘，帮助我们找到好看的配色
df_all["color"] = [color[unique_product.index(pro)] for pro in df_all["险别"].values]
</code></pre>
<p dir="auto">我们还要筛选展示的事故年，因此，我们需要读取最小的事故年和最大的事故年。</p>
<pre><code class="language-python">year_start = df_all['事故年'].min()
year_end =  df_all['事故年'].max()
</code></pre>
<p dir="auto">最后一个要准备的是要展示的数据y列是什么。这里需要做一个字典用来对应选项和数据列名的关系。</p>
<pre><code class="language-python">axis_map = {
    "ULR": "ULR",
    "ULAE": "EP",
    "DAC":'DAC'
}
</code></pre>
<p dir="auto">接下来就是作图啦。图分为左右两边。左边的部分叫做 control，右边的部分叫做 plot。</p>
<h3>制作control</h3>
<pre><code class="language-python"># year_range: 展示的事故年范围
year_range = RangeSlider(start=year_start, end=year_end, value=(year_start,year_end), step=1,
                       title="展示年")
Slider(title="开始展示年", start=year_start, end=year_end, value=year_start, step=1)
max_year = Slider(title="结束展示年", start=year_start, end=year_end, value=year_end, step=1)
# 选择的公司，险别，险种
company = Select(title="公司选择", value="All",
               options=unique_company)
business = Select(title="险别选择", value="All",
               options=unique_business)
product = Select(title="险种选择", value="All",
               options=unique_product)
y_axis = Select(title="展示值", options=sorted(axis_map.keys()), value="ULR")

controls = [company, business, product, year_range,  y_axis]
</code></pre>
<h3>制作plot</h3>
<pre><code class="language-python"># Tooltips用来制作鼠标悬浮于数据时的数据标签
TOOLTIPS=[
    ("公司为", "@com"),
    ("年:", "@year"),
    ("险别为", "@business"),
    ("险种为", "@pro")
]
# TOOLS规定了哪些工具要显示出来，比如拖曳等
TOOLS="pan,wheel_zoom,box_select,lasso_select,reset"
p = figure(tools=TOOLS,plot_height=100, plot_width=200, title="", toolbar_location="above", tooltips=TOOLTIPS, sizing_mode="scale_both")
r = p.circle(x="x",y="y" ,source=source, size=10, color = 'color', alpha=0.6, hover_color='white', hover_alpha=0.5)
# PointDrawTool这个工具需要单独放入其中
draw_tool = PointDrawTool(renderers=[r], empty_value='black')
p.add_tools(draw_tool)
p.toolbar.active_tap = draw_tool

</code></pre>
<h3>更新数据</h3>
<pre><code class="language-python">def select_products():
    # strip可以去除数据前面或者后面的空格
    company_val = company.value.strip()
    business_val = business.value.strip()
    product_val = product.value.strip()
    # 选择事故年
    selected = df_all[
        (df_all.事故年 &gt;= year_range.value[0]) &amp;
        (df_all.事故年 &lt;= year_range.value[1]) 
    ]
    # 选择公司，险种，险别等
    if (company_val != "All"):
        selected = selected[selected.公司.str.contains(company_val)==True]
    if (business_val != "All"):
        selected = selected[selected.险种.str.contains(business_val)==True]
    if (product_val != "All"):
        selected = selected[selected.险别.str.contains(product_val)==True]
    return selected

# 这个函数用来更新数据源
def update():
    df = select_products()
    x_name = "事故年"
    y_name = axis_map[y_axis.value]
    p.title.text = "%d points selected" % len(df)
    source.data = dict(
        x=df[x_name],
        y=df[y_name],
        com=df["公司"].values,
        year=df["事故年"].values,
        business=df["险别"].values,
        pro=df["险种"].values,
        color = df["color"]
    )
# control中的每一个元素改变后，都需要运行update()
for control in controls:
    control.on_change('value', lambda attr, old, new: update())
</code></pre>
<h3>生成图</h3>
<pre><code class="language-python"># input就是图左边的control
inputs = column(*controls, width=320, height=1000)
inputs.sizing_mode = "fixed"

l = layout([
    [inputs, p],
], sizing_mode="scale_both")

update()
curdoc().add_root(l, p)

</code></pre>
<h3>templates文件夹：利用Bokeh自带Jinja模板对网页更改基本样式</h3>
<p dir="auto">这个时候就要用到templates文件夹啦！它里面的index.html是对于jinja模板的补充。</p>
<p dir="auto">Jinja模板如下，这个我们没有办法改，想要改的话只能用JS在后面改。</p>
<pre><code class="language-html">&lt;!DOCTYPE html&gt;
&lt;html lang="en"&gt;
{% block head %}
&lt;head&gt;
    {% block inner_head %}
    &lt;meta charset="utf-8"&gt;
    &lt;title&gt;{% block title %}{{ title | e if title else "Bokeh Plot" }}{% endblock %}&lt;/title&gt;
    {% block preamble %}{% endblock %}
    {% block resources %}
        {% block css_resources %}
        {{ bokeh_css | indent(8) if bokeh_css }}
        {% endblock %}
        {% block js_resources %}
        {{ bokeh_js | indent(8) if bokeh_js }}
        {% endblock %}
    {% endblock %}
    {% block postamble %}{% endblock %}
    {% endblock %}
&lt;/head&gt;
{% endblock %}
{% block body %}
&lt;body&gt;
    {% block inner_body %}
    {% block contents %}
        {% for doc in docs %}
        {{ embed(doc) if doc.elementid }}
        {% for root in doc.roots %}
            {{ embed(root) | indent(10) }}
        {% endfor %}
        {% endfor %}
    {% endblock %}
    {{ plot_script | indent(8) }}
    {% endblock %}
&lt;/body&gt;
{% endblock %}
&lt;/html&gt;
</code></pre>
<p dir="auto">index.html 的基本格式如下：</p>
<pre><code class="language-html">{% extends base %}

&lt;!-- goes in head --&gt;
{% block preamble %}
&lt;link href="app/static/css/custom.min.css" rel="stylesheet"&gt;
{% endblock %}

&lt;!-- goes in body --&gt;
{% block contents %}
&lt;div&gt; {{ embed(roots.scatter) }} &lt;/div&gt;
&lt;div&gt; {{ embed(roots.line) }} &lt;/div&gt;
{% endblock %}
</code></pre>
<p dir="auto">我在标准模板中加了一些代码，来保证 html 的语言选项是 zh，也就是中文，加以对CSS文件的修改，就大功告成啦！</p>
<pre><code class="language-js">window.onload = function() {
  document.querySelector("html").lang = "zh";
};
</code></pre>
<h3>运行</h3>
<p dir="auto">在命令行中先 cd 到 app 所在文件夹，并输入:</p>
<pre><code>bokeh serve app
</code></pre>
<p dir="auto">或者</p>
<pre><code>bokeh serve --show app
</code></pre>
<p dir="auto">或在 Debug 模式运行</p>
<pre><code>bokeh serve --log-level=debug app
</code></pre>
<p dir="auto">当 python 由于版本不同可能有冲突时，可以使用:</p>
<pre><code>python3 -m bokeh serve app
</code></pre>
<p dir="auto">完整代码在github: <a href="https://github.com/Mengkee/bokeh_example" rel="nofollow">https://github.com/Mengkee/bokeh_example</a></p>
<h2>参考文献</h2>
<p dir="auto"><a href="https://docs.bokeh.org/en/latest/docs/gallery.html" rel="nofollow">Bokeh Gallery</a></p>
<p dir="auto"><a href="https://demo.bokeh.org/movies" rel="nofollow">Bokeh Sample: movies</a></p>
<p dir="auto"><a href="https://demo.bokeh.org/crossfilter" rel="nofollow">Bokeh Sample: crossfilter</a></p>
]]></description><link>https://www.actuaryunion.com/topic/73/财险数据交互式可视化-运用python的bokeh包</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/73/财险数据交互式可视化-运用python的bokeh包</guid><dc:creator><![CDATA[Mengkelyu]]></dc:creator><pubDate>Sun, 14 Jun 2020 09:13:16 GMT</pubDate></item><item><title><![CDATA[用Python分析财险数据——菜鸟向]]></title><description><![CDATA[<p dir="auto">文章见精算后花园博客 <a href="http://actuarygarden.cn" rel="nofollow">actuarygarden.cn</a></p>
<p dir="auto"><a href="https://actuarygarden.cn/Python-General-Insurance-data/" rel="nofollow">用Python分析财险数据——菜鸟向</a></p>
<p dir="auto">放上本文用到的数据以供大家动手实验：</p>
<p dir="auto"><a href="/assets/uploads/files/1592035763792-%E5%81%87%E6%95%B0%E6%8D%AE.rar">假数据.rar</a></p>
]]></description><link>https://www.actuaryunion.com/topic/71/用python分析财险数据-菜鸟向</link><guid isPermaLink="true">https://www.actuaryunion.com/topic/71/用python分析财险数据-菜鸟向</guid><dc:creator><![CDATA[Alonso]]></dc:creator><pubDate>Sat, 13 Jun 2020 07:39:53 GMT</pubDate></item></channel></rss>