第100+1步 ChatGPT文献复现：ARIMAX预测肺结核 vol. 1

本文主要是介绍第100+1步 ChatGPT文献复现：ARIMAX预测肺结核 vol. 1，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

基于WIN10的64位系统演示

一、写在前面

各位大佬，好久不见。

《100步入门机器学习》肝完了，不懂大家学了多少了，默认你们都学完了吧。

今年我们换一个玩法（灌水）：一系列更接近实战的教程，复现一些经典的文献。

姑且叫做《100+X步入门机器学习》！

首先，从ARIMAX模型开始，龙年第一篇：

《PLoS One》杂志的2023年一篇题目为《A comparative study of three models to analyze the impact of air pollutants on the number of pulmonary tuberculosis cases in Urumqi, Xinjiang》文章的公开数据做案例。

这文章做的是用：空气污染物对新疆乌鲁木齐肺结核病例数影响的比较研究。

为了让大家一步一个脚印（多水教程），我就一次只弄1-2个结果咯。

二、学习和复现：结果一描述性分析

首先，结果一长这样：

（1）咒语

[复制结果2的图片]

请问这个论文结果使用了什么方法？

并给出python代码，同时运行这个代码，输出最终的输出图片。

原始数据见附件“{pone.0277314.s006.xlsx}”，其中只需要使用前两列进行分析：“Time”以及“PTB cases”。

你可以一步一步来，进行完一步操作以后，保存结果，然后问我是否继续分析。我说继续分析，你才继续。

请用中回答。

输入咒语，跟着GPT回复：

这张论文结果图显示了时间序列分解，这通常涉及到将时间序列数据分解成趋势、季节性和随机成分。从图中可以看出，有四个子图分别代表原始的时间序列（PTB cases），季节性指数（Seasonal Index），趋势（Trend），以及周期性成分（Cycle）。

在Python中，可以使用statsmodels库的季节性分解功能进行这种分析。首先，我们需要加载数据并使用statsmodels中的seasonal_decompose函数。我将开始这个过程，首先加载数据，然后进行初步的时间序列分解。如果结果正确，我们可以继续进一步的分析步骤。

现在我会加载数据，并展示如何使用Python进行季节性分解。

（2）GPT也给出对应的代码：

import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt
# 加载数据，这次使用正确的列名
file_path = 'pone.0277314.s006.xlsx'
data_check = pd.read_excel(file_path)  # Load data to check the column names
corrected_column = data_check.columns[1]  # This is the actual column name with the invisible character# Now loading the data with the correct column name
data = pd.read_excel(file_path, usecols=['Time', corrected_column])# Convert 'Time' to datetime and set as the index for the data
data['Time'] = pd.to_datetime(data['Time'])
data.set_index('Time', inplace=True)# Perform the seasonal decomposition
decomposition = seasonal_decompose(data[corrected_column], model='additive', period=12)plt.rc('figure', figsize=(10, 8))  # Adjusting the size may help with layout issues# Plot the decomposition results
decomposition.plot()# Use tight layout to automatically adjust subplot parameters
plt.tight_layout()plt.subplots_adjust(left=0.05)  # 减少左侧边距# Save the decomposition plot to a file
fig = plt.gcf()  # Get the current figure object
adjusted_img_path = '/mnt/data/adjusted_decomposed_timeseries.png'
fig.savefig(adjusted_img_path, bbox_inches='tight')  # Save the figure with tight bounding box

给出代码的基本步骤和功能：

1、使用pandas库加载Excel文件，并只选择了'Time'和'PTB cases'这两列。

2、将'Time'列转换为datetime对象，并设置为数据框的索引。

3、使用seasonal_decompose函数从statsmodels库中进行时间序列的季节性分解。

4、设置图表大小，绘制并展示分解结果。

5、保存结果图为PNG格式。

（3）试着在本地Python运行：

在Spyder中输入：

可以运行，但是吧，左边空了一大片。

此时，你可以继续求助GPT，注意看我的咒语：

GPT给出优化方案：

图片左侧的空白区域可能是由于日期格式不正确或者是matplotlib在绘图时默认的布局设置导致的。解决这个问题的方法可以有以下几个：

检查和修正日期格式：确保时间序列是正确的datetime对象，并且是以正确的格式显示在x轴上。

使用紧凑布局：plt.tight_layout()可以自动调整子图参数，让它们更好地适应图表区域。

调整子图间距：使用plt.subplots_adjust()可以手动调整子图的布局设置。

修改图表尺寸：调整图表的尺寸以适应子图的显示。

优化x轴标签：如果x轴的刻度标签太长或者不必要，可以缩短它们或者完全移除。

这回答提醒了我：

因为原始数据的日期是中文格式：2014年1月，所以代码不支持。

我只需要把日期格式改为：2014-1，即可：

不过继续完善横坐标，也就是横坐标从2014-01开始，每隔三个月展示一个日期，2014-01、2014-03、一直到2018-12，这里就不演示了。

（4）我有一个大胆的想法：

请使用英文帮我就这个图片写一段SCI论文的result，要求精简：

Results

Temporal Decomposition of PTB Cases in Urumqi

We employed time series decomposition to assess the patterns of PTB (Pulmonary Tuberculosis) cases in Urumqi from January 2014 to December 2018. The decomposition model allowed us to dissect the time series into four distinct components: observed, seasonal, trend, and residual.

The observed data (Fig. 2, top panel) indicates the number of PTB cases reported each month over the five-year period. Fluctuations in the data illustrate the dynamic nature of PTB incidence over time. The seasonal component (Fig. 2, second panel) reveals a consistent pattern repeating yearly, suggesting a regular influence of seasonal factors on PTB cases.

The trend component (Fig. 2, third panel) demonstrates a gradual decrease in the number of PTB cases over the years, signaling a promising downtrend in incidence. This decline may reflect the effectiveness of ongoing public health interventions and improvements in local healthcare practices.

Lastly, the cyclical component (Fig. 2, bottom panel), which captures fluctuations adjusted for seasonality and trend, suggests additional, irregular forces at play that periodically affect PTB case numbers. These could correspond to sporadic outbreaks or changes in reporting practices.

Our analysis underscores the multifaceted nature of PTB case dynamics in Urumqi, shaped by seasonal patterns, long-term trends, and irregular episodic factors. The discernible seasonal patterns highlight potential opportunities for targeted seasonal interventions, while the overall downward trend is indicative of successful long-term disease control measures in the region.

不错，写得很详细，下次不要描述的那么详细：

In the time series decomposition of PTB cases in Urumqi from January 2014 to December 2018, we observed a clear seasonal pattern and a long-term downward trend in cases. The decomposition highlights consistent seasonal fluctuations and suggests the effectiveness of health interventions over the studied period.

能不能用，大家自行对照原文，认真对照学习（看看怎么白嫖合适）。

三、个人感悟

利用GPT-4的多模态能力，不仅可以浏览和理解学术文献，还能直接提取其中的编程代码，并且在本地平台上即时运行这些代码，这极大地提高了研究（白嫖）和学习（偷懒）的效率。

四、数据

链接：https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0277314

这篇关于第100+1步 ChatGPT文献复现：ARIMAX预测肺结核 vol. 1的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！