材料:

  • ML研究和工程,至少需要Python 3,pandas,numpy,scikit-learn,matplotlib,tensorflow和jupyterlab。
  • Pytorch 1.7 (3090) 出错问题解决参考

环境搭建-对比M1芯片和Nivida3090

测试使用FashionMNIST数据集:

这里简单介绍一下FashionMNIST数据集,原始的MNIST数据集只有1-9的手写数字,整个数据过于简单,无法胜任作为现代CV任务的benchmark的职责,很难有效验证模型的合理性。FashionMNIST包含了60000张训练图片,以及10000张测试集图片,和初代MNIST数据集一样,每张图片都是28 * 28的大小的黑白图片,但是包含的主要是衣服,鞋子,手提包等时尚界物品。图片内容更为复杂,相比于1-9的手写数字,能更为有效的验证模型的合理性。

使用的数据集是 Fashion-MNIST,获取链接如下

https://github.com/zalandoresearch/fashion-mnistgithub.com

实验

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
#import libraries
import tensorflow as tf
import time

#download fashion mnist dataset
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

train_set_count = len(train_labels) #60000
test_set_count = len(test_labels) #10000

#setup start time
t0 = time.time()

#normalize images
train_images = train_images / 255.0
test_images = test_images / 255.0

#create ML model
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10)
])

#compile ML model
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])

#train ML model
model.fit(train_images, train_labels, epochs=10)

#evaluate ML model on test set
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)

#setup stop time
t1 = time.time()
total_time = t1-t0

#print results
print('\n')
print(f'Training set contained {train_set_count} images')
print(f'Testing set contained {test_set_count} images')
print(f'Model achieved {test_acc:.2f} testing accuracy')
print(f'Training and testing took {total_time:.2f} seconds')

RTX 3090 的试验配置

System: Linux

CPU: Intel® Core™ V4-2650

RAM: 128GB

Storage: 1TB SSD

tensorflow环境配置:

docker container run -ti tensorflow/tensorflow

1
2
3
4
5
6
7
8
>>> print(f'Training set contained {train_set_count} images')
Training set contained 60000 images
>>> print(f'Testing set contained {test_set_count} images')
Testing set contained 10000 images
>>> print(f'Model achieved {test_acc:.2f} testing accuracy')
Model achieved 0.83 testing accuracy
>>> print(f'Training and testing took {total_time:.2f} seconds')
Training and testing took 40.90 seconds

训练10个epoch,包括test,最终耗时为 40.90s

M1的试验配置

System: macOS Big Sur

Storage: 512GB SSD

Unified Memory: 16GB

M1芯片包含8个CPU核,8个GPU核以及16个神经网络引擎核心。

1
2
3
4
5
6
7
8
>>> print(f'Training set contained {train_set_count} images')
Training set contained 60000 images
>>> print(f'Testing set contained {test_set_count} images')
Testing set contained 10000 images
>>> print(f'Model achieved {test_acc:.2f} testing accuracy')
Model achieved 0.20 testing accuracy
>>> print(f'Training and testing took {total_time:.2f} seconds')
Training and testing took 6.85 seconds

结论,我买了一张假的3090!?

原来我是放在CPU上了。

继续探索,使用docker部署Tensorflow GPU环境

若需在 TensorFlow Docker 容器中开启 GPU 支持,需要具有一块 NVIDIA 显卡并已正确安装驱动程序。同时需要安装 nvidia-docker 。依照官方文档中的 quickstart 部分逐行输入命令即可。

安装完毕后,在 docker container run 命令中添加 --runtime=nvidia 选项,并基于具有 GPU 支持的 TensorFlow Docker 映像启动容器即可,即:

1
docker container run -it --runtime=nvidia tensorflow/tensorflow:latest-gpu-py3 bash

结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
>>> #train ML model
... model.fit(train_images, train_labels, epochs=10)
Train on 60000 samples
Epoch 1/10
60000/60000 [==============================] - 6s 92us/sample - loss: 0.4969 - accuracy: 0.8244
Epoch 2/10
60000/60000 [==============================] - 5s 91us/sample - loss: 0.3787 - accuracy: 0.8631
Epoch 3/10
60000/60000 [==============================] - 5s 88us/sample - loss: 0.3386 - accuracy: 0.8772
Epoch 4/10
60000/60000 [==============================] - 5s 86us/sample - loss: 0.3150 - accuracy: 0.8853
Epoch 5/10
60000/60000 [==============================] - 5s 86us/sample - loss: 0.2967 - accuracy: 0.8906
Epoch 6/10
60000/60000 [==============================] - 5s 88us/sample - loss: 0.2787 - accuracy: 0.8962
Epoch 7/10
60000/60000 [==============================] - 5s 86us/sample - loss: 0.2698 - accuracy: 0.9006
Epoch 8/10
60000/60000 [==============================] - 5s 86us/sample - loss: 0.2576 - accuracy: 0.9036
Epoch 9/10
60000/60000 [==============================] - 5s 88us/sample - loss: 0.2489 - accuracy: 0.9070
Epoch 10/10
60000/60000 [==============================] - 5s 89us/sample - loss: 0.2392 - accuracy: 0.9104

用了45s左右,还是很慢。

再仔细看了一下,发现了一些猫腻,在3090GPU上,为何训练的是60000samples整,但在m1上只有1875,这个原因目前还没有找到。60000/1875=32,难道m1还32核并行?

所以还不能直接下结论。倒是证明了一点,m1的速度确实快。

RNN-Lyapunov-Spectrum搭建

1
2
python3 -m venv venv-RNN-Lyapunov-Spectrum
source $HOME/venv-RNN-Lyapunov-Spectrum/bin/activate

安装pytorch,参考:

1
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
1
pip install matplotlib scipy psutil tqdm

测试代码

RNN-RC-Chaos搭建

目前只能在linux系统上运行,而且必须在python3.7.3版本上。因此,我们使用conda env安装虚拟环境

1
2
3
4
5
6
7
conda create -n python3.7 python=3.7.3
conda activate python3.7
pip3 install tensorflow==1.14.0
pip3 install matplotlib sklearn psutil mpi4py
pip3 install torch scipy tqdm #pytorch
# Downloading torch-1.7.1-cp37-cp37m-manylinux1_x86_64.whl (776.8 MB)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 01_ESN_auto.sh
# 参数
#!/bin/bash

cd ../../../Methods



python3 RUN.py esn \ # method : esn
--mode all \
--display_output 1 \
--system_name Lorenz3D \
--write_to_log 1 \
--N 100000 \
--N_used 1000 \
--RDIM 1 \
--noise_level 10 \
--scaler Standard \
--approx_reservoir_size 1000 \
--degree 10 \
--radius 0.6 \
--sigma_input 1 \
--regularization 0.0 \
--dynamics_length 200 \
--iterative_prediction_length 500 \
--num_test_ICS 2 \
--solver auto \
--number_of_epochs 1000000 \
--learning_rate 0.001 \
--reference_train_time 10 \
--buffer_train_time 0.5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
#!/usr/bin/env python
# # -*- coding: utf-8 -*-

"""Created by: Vlachas Pantelis, CSE-lab, ETH Zurich
"""
#!/usr/bin/env python

import sys
from Config.global_conf import global_params
sys.path.insert(0, global_params.global_utils_path)
from plotting_utils import *
from global_utils import *

import argparse


def getModel(params):
sys.path.insert(0, global_params.py_models_path.format(params["model_name"]))
if params["model_name"] == "esn":
import esn as model
return model.esn(params)
elif params["model_name"] == "esn_parallel":
import esn_parallel as model
return model.esn_parallel(params)
elif params["model_name"] == "rnn_statefull":
import rnn_statefull as model
return model.rnn_statefull(params)
elif params["model_name"] == "rnn_statefull_parallel":
import rnn_statefull_parallel as model
return model.rnn_statefull_parallel(params)
elif params["model_name"] == "mlp":
import mlp as model
return model.mlp(params)
else:
raise ValueError("model not found.")
def runModel(params_dict):
if params_dict["mode"] in ["train", "all"]:
trainModel(params_dict)
if params_dict["mode"] in ["test", "all"]:
testModel(params_dict)
return 0

def trainModel(params_dict):
model = getModel(params_dict)
model.train()
model.delete()
del model
return 0

def testModel(params_dict):
model = getModel(params_dict)
model.testing()
model.delete()
del model
return 0


def defineParser():
parser = argparse.ArgumentParser()
subparsers = parser.add_subparsers(help='Selection of the model.', dest='model_name')

esn_parser = subparsers.add_parser("esn")
esn_parser = getESNParser(esn_parser)

esn_parallel_parser = subparsers.add_parser("esn_parallel")
esn_parallel_parser = getESNParallelParser(esn_parallel_parser)

rnn_statefull_parser = subparsers.add_parser("rnn_statefull")
rnn_statefull_parser = getRNNStatefullParser(rnn_statefull_parser)

rnn_statefull_parallel_parser = subparsers.add_parser("rnn_statefull_parallel")
rnn_statefull_parallel_parser = getRNNStatefullParallelParser(rnn_statefull_parallel_parser)
mlp_parser = subparsers.add_parser("mlp")
mlp_parser = getMLPParser(mlp_parser)

mlp_parallel_parser = subparsers.add_parser("mlp_parallel")
mlp_parallel_parser = getMLPParallelParser(mlp_parallel_parser)

return parser

def main():
parser = defineParser()
args = parser.parse_args()
print(args.model_name)
args_dict = args.__dict__

# for key in args_dict:
# print(key)

# DEFINE PATHS AND DIRECTORIES
args_dict["saving_path"] = global_params.saving_path.format(args_dict["system_name"])
args_dict["model_dir"] = global_params.model_dir
args_dict["fig_dir"] = global_params.fig_dir
args_dict["results_dir"] = global_params.results_dir
args_dict["logfile_dir"] = global_params.logfile_dir
args_dict["train_data_path"] = global_params.training_data_path.format(args.system_name, args.N)
args_dict["test_data_path"] = global_params.testing_data_path.format(args.system_name, args.N)
args_dict["worker_id"] = 0

runModel(args_dict)

if __name__ == '__main__':
main()

结果输出:

1
2
# test.txt
model_name:RNN-esn_auto-RDIM_1-N_used_1000-SIZE_1000-D_10.0-RADIUS_0.6-SIGMA_1.0-DL_200-NL_10-IPL_500-REG_0.0-WID_0:num_test_ICS:2.00:num_accurate_pred_005_avg_TEST:20.00:num_accurate_pred_050_avg_TEST:82.00:num_accurate_pred_005_avg_TRAIN:14.00:num_accurate_pred_050_avg_TRAIN:93.50:error_freq_TRAIN:4.68:error_freq_TEST:2.14
1
2
# train.txt
model_name:RNN-esn_auto-RDIM_1-N_used_1000-SIZE_1000-D_10.0-RADIUS_0.6-SIGMA_1.0-DL_200-NL_10-IPL_500-REG_0.0-WID_0:memory:110.07:total_training_time:1.02:n_model_parameters:12000:n_trainable_parameters:1000

Pytorch

安装篇

1
2
3
4
conda create -n python_3.9.1 python=3.9.1
conda activate python_3.9.1
conda install setuptools cffi typing_extensions future six requests dataclasses pkg-config libuv pyyaml

另外一种方法:暂时不知道区别-主要多加了conda-forge

1
2
conda create -n py39 -c conda-forge -y #配置环境-conda-forge
conda install python=3.9.1 #进入环境后,安装
1
conda info --env看看环境:
1
2
3
export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=11.1 CC=clang CXX=clang++ python setup.py install
#编译torch-cpu,大概要等半小时。。

linux GPU版本-cuda10.1 安装教程见[官网 ]https://pytorch.org/get-started/locally/#linux-verification)

1
2
3
conda install pytorch torchvision torchaudio cudatoolkit=10.1 -c pytorch
# 安装的版本为pytorch 1.7.1

通过Lambda Stack管理:参考

1
2
3
4
5
6
7
8
9
10
# 我首先重装了系统,然后安装GPU-455驱动,重启

LAMBDA_REPO=$(mktemp) && \
wget -O${LAMBDA_REPO} https://lambdalabs.com/static/misc/lambda-stack-repo.deb && \
sudo dpkg -i ${LAMBDA_REPO} && rm -f ${LAMBDA_REPO} && \
sudo apt-get update && sudo apt-get install -y lambda-stack-cuda

# 最后要注意的是,整个python为系统,而非ancconda,因此,bashrc暂时被禁掉,只有需要才用。


使用jupyter-notebook 我专门创建了一个镜像。!!参考!!,目的为 便于管理项目。

最全的Python虚拟环境使用方法 https://zhuanlan.zhihu.com/p/60647332

如果本地有的库,在虚拟环境pip install时将会自动导入。

本地库文件位置:

1
2
3
4
5
6
7
8
Pip install --> /home/wjq/.local/lib/python3.8/site-packages

sudo pip install --> /usr/local/lib/python3.8/dist-packages/pybind11/include

如何查看安装python位置:>>>import XXX; >>>XXX.get_include()

sudo make install --〉/usr/bin/include 和 /usr/bin/share

注意:

1
2
3
virtualenv project_env --system-site-packages #表示添加本地的库,不然就是一个只有python的裸库,也就是说没有torch,tensorflow包
# 这样,在本地安装的库,项目直接包含,但是,在虚拟环境安装的,只在虚拟环境使用。
jupyter-notebook #接下来打开notebook的操作都只在虚拟环境里进行,与本地无关。nice!

测试不同硬件性能篇

测试1:CPU:linux vs Mac m1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
jupyter notebook #启动jupyter notebook 
#直接通过!pip install *** 安装没有的package
# 测试代码:

from tqdm import tqdm
import torch

@torch.jit.script
def foo():
x = torch.ones((1024 * 12, 1024 * 12), dtype=torch.float32)
y = torch.ones((1024 * 12, 1024 * 12), dtype=torch.float32)
z = x + y
return z


if __name__ == '__main__':
z0 = None
for _ in tqdm(range(10000000000)):
zz = foo()
if z0 is None:
z0 = zz
else:
z0 += zz

结果:

Linux - 24 核 - CPU : 30 it/s

Mac - m1 - 4核 - cpu : 26 it/s 为啥没有满负荷运行?暂时不清楚原因

测试2:linux : CPU vs GPU

solve linear equation Ax = b 代码参考:此处

GPU 当矩阵比较大的时候,才会起效果。200+ 200, GPU虽然在计算大矩阵是速度很快,但是内存容量有限,因此超过一定程序满负荷运行,速度也会减慢。

1000 interaction/per GPU 24G-350W-CUDA11.1 CPU 单核
2000*2000 11s : 10% 14s
3000*3000 10s 24s
4000*4000 10s 80s
5000*5000 10s 127s
6000*6000 11s 210s
7000*7000 11s 270s
8000*8000 10s 370s
9000*9000 10s 460s
10000*10000 14s : 82% 600s
11000*11000 17: 91% 840s
12000*12000 19s:91% 内存2877MB 1025s
13000*13000 22s :89% 3526MB 1158s
14000*14000 25s: 89% 3522MB 1371s
15000*15000 36s: 91% 3520MB 1612s
16000*16000 42s:91% 4498MB 1761s
17000*17000 46s: 94% 5602MB 2057s
18000*18000 48s: 94% 6851MB
19000*19000
20000*20000 61s: 95% 55%
200000*200000 Out of memery

性能提升篇

torch- geometry 安装

1
2
3
4
5
pip install torch-scatter #如果遇到问题,缺少header pybind11.h,需要手动编译c库
pip install torch-sparse
pip install torch-cluster
pip install torch-spline-conv
pip install torch-geometric
1
2
3
4
5
6
7
# 在此之前,先手动编译pybind11
git clone https://github.com/pybind/pybind11
mkdir build
cd build
cmake ..
make check -j 4 (make and check, not necessary)
sudo make install

60;

RuntimeError: CUDA out of memory. Tried to allocate 5.75 GiB (GPU 0; 23.69 GiB total capacity; 17.45 GiB already allocated; 4.84 GiB free; 17.47 GiB reserved in total by PyTorch)

RuntimeError: CUDA out of memory. Tried to allocate 5.75 GiB (GPU 0; 23.69 GiB total capacity; 17.45 GiB already allocated; 4.84 GiB free; 17.47 GiB reserved in total by PyTorch)

Intel oneAPI install 参考:&#options=Local)以及GPU opencl:

1
2
3
. /opt/intel/oneapi/setvars.sh #首先得启动oneapi
oneapi-cli

参考教程: step-by-step

tensorflow 使用过程中的一些bug

  • 2020-01-03: 在测试xunhuang 的代码时,出现类似https://blog.csdn.net/PerfeyCui/article/details/108994385 bug

    AttributeError: module…ops‘ has no attribute ‘_TensorLike‘, ValueError: updates argument..eager。

    参考blog解决。

    原因: keras和tensorflow结合导致。

参考

1. [2019 SIAM Conference on Applications of Dynamical Systems
2. https://github.com/pvlachas/RNN-RC-Chaos
3. https://zhuanlan.zhihu.com/p/335638792
4. Backpropagation Algorithms and Reservoir Computing in Recurrent Neural Networks for the Forecasting of Complex Spatiotemporal Dynamics
5. Machine Learning for Fluid Mechanics
6. Detecting exotic wakes with hydrodynamic sensors Mengying Wang and Maziar S. Hemati
7. AIAA conference-2019,Machine Learning Based Detection of Flow Disturbances Using Surface Pressure Measurements Wei Hou∗1 , Darwin Darakananda†1 , and Jeff D. Eldredge‡1
8. Kiran Ramesh, Ashok Gopalarathnam, Kenneth Granlund, Michael V. Ol, and Jack R. Edwards. Discrete-vortex method with novel shedding criterion for unsteady aerofoil flows with intermittent leading-edge vortex shedding. Journal of Fluid Mechanics, 751:500–538, 2014
9. Data-assimilated low-order vortex modeling of separated flows,PHYSICAL REVIEW FLUIDS 3, 124701 (2018)