- 微信
- 微博
  
  分享文章到微博
- 复制链接
  
  复制链接到剪贴板

【MindSpore第六期两日集训营】MindElec作业记录

张辉发表于 2021/11/09 08:07:59 2021/11/09

【摘要】 MindElec是MindSpore 1.5的新特性之一，它是MindScience科学套件的一部分。

【MindSpore第六期两日集训营】于2021年11月6日到11月7日在B站拉开了帷幕，错过直播 https://live.bilibili.com/22127570 的老铁们别忘了还有录播，链接分别为：

第一天：

第六期两日集训营 | MindSpore AI电磁仿真 https://www.bilibili.com/video/BV1Y34y1Z7E8?spm_id_from=333.999.0.0

第六期两日集训营 | MindSpore并行使能大模型训练 https://www.bilibili.com/video/BV193411b7on?spm_id_from=333.999.0.0

第六期两日集训营 | MindSpore Boost，让你的训练变得飞快 https://www.bilibili.com/video/BV1c341187ML?spm_id_from=333.999.0.0

第二天：

第六期两日集训营 | MindSpore 控制流概述 https://www.bilibili.com/video/BV1A34y1d7G7?spm_id_from=333.999.0.0

第六期两日集训营 | MindSpore Lite1.5特性发布，带来全新端侧AI体验 https://www.bilibili.com/video/BV1f34y1o7mR?spm_id_from=333.999.0.0

第六期两日集训营 | 可视化集群调优重磅发布，从LeNet到盘古大模型都能调优 https://www.bilibili.com/video/BV1dg411K7Nb?spm_id_from=333.999.0.0

我们先看第一天第一讲，MindScience的MindElec——电磁仿真。

第一讲的作业如下：

其实张小白已经尝试过MindScience的MindSPONGE分子模拟套件包了：

具体链接如下：

论坛：https://bbs.huaweicloud.cn/forum/forum.php?mod=viewthread&tid=159269

博客：https://bbs.huaweicloud.cn/blogs/302842

但是既然作业2要求做MindElec电磁仿真，所以，作业1也可以用MindElec来做一下。

一、购买ECS GPU云服务器

我们使用ECS的GPU云服务器来完成这个作业的MindElec部分，MindSponge的部分请看前面的链接。

到华为云的控制台-》ECS，切换到北京四，按照下图所示购买：

点击立即购买：

由于费用是1小时7块多，所以张小白迫不及待地登陆进去。

先看了一下内存和CUDA的版本：11.0

二、安装Anaconda环境

由于MindSpore传统上都是使用Python 3.7.5环境（当然后面也支持了Python 3.9），所以先装conda环境：

...

source ~/.bashrc

发现装的版本太老了，只好重新下载最新的Anaconda：

下载好后将其传到服务器，执行：

bash ./Anaconda3-2021.05-Linux-x86_64.sh

安装的时候自然提示目录已存在，

rm -rf /root/anaconda3

重新执行：

bash ./Anaconda3-2021.05-Linux-x86_64.sh

三、创建mindspore1.5的conda环境：

conda create -n mindspore1.5 python=3.7.5

。。。

conda activate mindspore1.5

conda install -c conda-forge pythonocc-core=7.5.1 cudatoolkit=11.1

按Y继续：

conda环境的CUDA 11.1的包比较大（1.2G），要耐心等待下载。

pythonocc也在其中。

四、安装mindspore 1.5的GPU版本

pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.5.0/MindSpore/gpu/x86_64/cuda-11.1/mindspore_gpu-1.5.0-cp37-cp37m-linux_x86_64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple

五、安装mindelec:

我们直接使用官网提供的MindElec的包安装吧，虽然名字写的是ascend，但是老师说gpu也能用。

wget https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.5.0/MindScience/x86_64/mindscience_mindelec_ascend-0.1.0-cp37-cp37m-linux_x86_64.whl

pip install ./mindscience_mindelec_ascend-0.1.0-cp37-cp37m-linux_x86_64.whl -i https://pypi.tuna.tsinghua.edu.cn/simple

验证安装：

出错了，cuda是11.0版本了，而且cudnn似乎没有安装。

六·、安装cuda 11.1和对应的cudnn 8.0.5

wget https://developer.download.nvidia.com/compute/cuda/11.1.0/local_installers/cuda_11.1.0_455.23.05_linux.run

sh cuda_11.1.0_455.23.05_linux.run

按下面的方式选择：

按图中的提示方式修改~/.bashrc:

PATH 增加 /usr/local/cuda-11.1/bin
LD_LIBRARY_PATH 增加 /usr/local/cuda-11.1/lib64

再检查一下CUDA版本：

nvidia-smi

是11.1了。

下载CUDA 11.1对应的cudnn 8.0.5（其他版本也可以装，只要对应CUDA 11.1即可），并将其上传到服务器：

解压

tar -zxvf cudnn-11.1-linux-x64-v8.0.5.39.tgz

将其拷贝到cuda的相应目录下：

七、验证mindspore 1.5和MindElec的安装：

python -c "import mindspore;mindspore.run_check()"

或者 vi test.py

python test.py

验证mindelec的安装：

python -c 'import mindelec'

好像万事俱备。

那么能不能成功尝试mindelec的例子呢？

八、下载MindElec代码仓：

git clone https://gitee.com/mindspore/mindscience.git

九、安装依赖包

1、安装easydict

2、安装opencv

pip install opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple

十、验证

1、试验数据驱动的参数化电磁仿真：

https://gitee.com/mindspore/mindscience/tree/master/MindElec/examples/data_driven/parameterization

以下试验均需将相关代码中的Ascend改为GPU后再进行验证，以后不再赘述。

。。。

终于结束了:

具体结果如下：

epoch: 9966 step: 55, loss is 1.067301e-06
epoch time: 156.272 ms, per step time: 2.841 ms
epoch: 9967 step: 55, loss is 1.6718128e-06
epoch time: 161.586 ms, per step time: 2.938 ms
epoch: 9968 step: 55, loss is 1.9428162e-06
epoch time: 165.269 ms, per step time: 3.005 ms
epoch: 9969 step: 55, loss is 1.1494253e-06
epoch time: 160.396 ms, per step time: 2.916 ms
epoch: 9970 step: 55, loss is 1.2750754e-06
epoch time: 154.781 ms, per step time: 2.814 ms
epoch: 9971 step: 55, loss is 1.2550026e-06
epoch time: 160.627 ms, per step time: 2.920 ms
epoch: 9972 step: 55, loss is 1.4948789e-06
epoch time: 159.846 ms, per step time: 2.906 ms
epoch: 9973 step: 55, loss is 1.8957531e-06
epoch time: 164.061 ms, per step time: 2.983 ms
epoch: 9974 step: 55, loss is 1.8941449e-06
epoch time: 164.542 ms, per step time: 2.992 ms
epoch: 9975 step: 55, loss is 2.340197e-06
epoch time: 166.823 ms, per step time: 3.033 ms
epoch: 9976 step: 55, loss is 1.5545256e-06
epoch time: 152.811 ms, per step time: 2.778 ms
epoch: 9977 step: 55, loss is 9.994957e-07
epoch time: 171.435 ms, per step time: 3.117 ms
epoch: 9978 step: 55, loss is 2.12672e-06
epoch time: 154.989 ms, per step time: 2.818 ms
epoch: 9979 step: 55, loss is 1.5981371e-06
epoch time: 159.917 ms, per step time: 2.908 ms
epoch: 9980 step: 55, loss is 1.6546201e-06
epoch time: 151.021 ms, per step time: 2.746 ms
epoch: 9981 step: 55, loss is 1.5869264e-06
epoch time: 162.313 ms, per step time: 2.951 ms
epoch: 9982 step: 55, loss is 1.1969032e-06
epoch time: 168.984 ms, per step time: 3.072 ms
epoch: 9983 step: 55, loss is 1.1927513e-06
epoch time: 163.749 ms, per step time: 2.977 ms
epoch: 9984 step: 55, loss is 1.0608298e-06
epoch time: 160.595 ms, per step time: 2.920 ms
epoch: 9985 step: 55, loss is 1.964669e-06
epoch time: 155.398 ms, per step time: 2.825 ms
epoch: 9986 step: 55, loss is 1.5706166e-06
epoch time: 165.935 ms, per step time: 3.017 ms
epoch: 9987 step: 55, loss is 1.3382705e-06
epoch time: 163.523 ms, per step time: 2.973 ms
epoch: 9988 step: 55, loss is 1.2119517e-06
epoch time: 168.339 ms, per step time: 3.061 ms
epoch: 9989 step: 55, loss is 1.7882771e-06
epoch time: 159.096 ms, per step time: 2.893 ms
epoch: 9990 step: 55, loss is 1.1589409e-06
epoch time: 160.459 ms, per step time: 2.917 ms
epoch: 9991 step: 55, loss is 8.78855e-07
epoch time: 156.461 ms, per step time: 2.845 ms
epoch: 9992 step: 55, loss is 1.3546548e-06
epoch time: 157.824 ms, per step time: 2.870 ms
epoch: 9993 step: 55, loss is 3.1089023e-06
epoch time: 158.035 ms, per step time: 2.873 ms
epoch: 9994 step: 55, loss is 1.4939134e-06
epoch time: 160.428 ms, per step time: 2.917 ms
epoch: 9995 step: 55, loss is 2.164372e-06
epoch time: 155.159 ms, per step time: 2.821 ms
epoch: 9996 step: 55, loss is 9.635824e-07
epoch time: 156.919 ms, per step time: 2.853 ms
epoch: 9997 step: 55, loss is 1.0471658e-06
epoch time: 160.262 ms, per step time: 2.914 ms
epoch: 9998 step: 55, loss is 1.4574234e-06
epoch time: 160.660 ms, per step time: 2.921 ms
epoch: 9999 step: 55, loss is 2.0352143e-06
epoch time: 150.130 ms, per step time: 2.730 ms
epoch: 10000 step: 55, loss is 9.816508e-07
epoch time: 156.031 ms, per step time: 2.837 ms
Eval   current epoch: 10000  loss: 0.0002412886533234922  l2_s11: 0.0030976369803562306

ckpt下应该是训练好的模型：

在eval_res下有49张图片：

将其下载下来可以看到：

2、试验物理驱动的AI求解频域麦克斯韦方程：

https://gitee.com/mindspore/mindscience/tree/master/MindElec/examples/physics_driven/frequency_domain_maxwell

cd ~/mindscience/MindElec/examples/physics_driven/frequency_domain_maxwell

python solve.py

。。

具体结果如下：

(mindspore1.5) root@ecs-zhanghui-gpu:~/mindscience/MindElec/examples/physics_driven/frequency_domain_maxwell# python solve.py
pid: 2676
check test dataset shape: (10201, 2), (10201, 1)
[WARNING] OPTIMIZER(2676,7feb1bba3740,python):2021-11-09-00:05:06.369.176 [mindspore/ccsrc/frontend/optimizer/ad/dfunctor.cc:803] GetPrimalUser] J operation has no relevant primal call in the same graph. Func graph: 679_75_construct.92, J user: 679_75_construct.92:construct{[0]: [CNode]93, [1]: x0, [2]: u}
[WARNING] OPTIMIZER(2676,7feb1bba3740,python):2021-11-09-00:05:06.382.175 [mindspore/ccsrc/frontend/optimizer/ad/dfunctor.cc:803] GetPrimalUser] J operation has no relevant primal call in the same graph. Func graph: 622_132_construct.94, J user: 622_132_construct.94:construct{[0]: [CNode]95, [1]: x0, [2]: u}
[WARNING] OPTIMIZER(2676,7feb1bba3740,python):2021-11-09-00:05:06.595.722 [mindspore/ccsrc/frontend/optimizer/ad/dfunctor.cc:803] GetPrimalUser] J operation has no relevant primal call in the same graph. Func graph: 894_465_7_construct.116, J user: 894_465_7_construct.116:construct{[0]: [CNode]117, [1]: [CNode]118, [2]: [CNode]119}
[WARNING] OPTIMIZER(2676,7feb1bba3740,python):2021-11-09-00:05:06.614.336 [mindspore/ccsrc/frontend/optimizer/ad/dfunctor.cc:803] GetPrimalUser] J operation has no relevant primal call in the same graph. Func graph: 894_465_7_construct.116, J user: 894_465_7_construct.116:construct{[0]: [CNode]120, [1]: [CNode]118, [2]: [CNode]121}
[WARNING] CORE(2676,7feb1bba3740,python):2021-11-09-00:05:07.738.476 [mindspore/core/ir/anf_extends.cc:65] fullname_with_scope] Input 0 of cnode is not a value node, its type is CNode.
epoch: 1 step: 78, loss is 600.0
epoch time: 11268.853 ms, per step time: 144.472 ms
epoch: 2 step: 78, loss is 225.4
epoch time: 1389.687 ms, per step time: 17.816 ms
epoch: 3 step: 78, loss is 199.9
================================Start Evaluation================================
Total prediction time: 0.19255661964416504 s
l2_error:  0.20626515080160301
=================================End Evaluation=================================
epoch time: 1610.614 ms, per step time: 20.649 ms
epoch: 4 step: 78, loss is 10.19
epoch time: 1730.271 ms, per step time: 22.183 ms
epoch: 5 step: 78, loss is 2.803
epoch time: 1429.185 ms, per step time: 18.323 ms
epoch: 6 step: 78, loss is 2.316
================================Start Evaluation================================
Total prediction time: 0.0025403499603271484 s
l2_error:  0.019291123630052236
=================================End Evaluation=================================
epoch time: 1420.687 ms, per step time: 18.214 ms
epoch: 7 step: 78, loss is 2.2
epoch time: 1844.602 ms, per step time: 23.649 ms
epoch: 8 step: 78, loss is 1.953
epoch time: 1408.553 ms, per step time: 18.058 ms
epoch: 9 step: 78, loss is 1.856
================================Start Evaluation================================
Total prediction time: 0.0025916099548339844 s
l2_error:  0.015916268073532643
=================================End Evaluation=================================
epoch time: 1404.208 ms, per step time: 18.003 ms
epoch: 10 step: 78, loss is 1.33
epoch time: 1459.013 ms, per step time: 18.705 ms
l2 error: 0.0159162681
per step time: 18.7052916258

3、试验物理驱动的AI求解点源麦克斯韦方程组

https://gitee.com/mindspore/mindscience/tree/master/MindElec/examples/physics_driven/incremental_learning

cd ~/mindscience/MindElec/examples/physics_driven/incremental_learning

修改为GPU之后执行：

python piad.py --mode=pretrain

。。。

耐心等待：

突然发现pretrain的epoch是3000：

由于张小白囊中羞涩，所以果然暂停了训练：

但是估计mindspore团队是经过估算的，只有跑3000个epoch才能把loss降到0.1以下吧。。。现在loss虽然在收敛，但是还是蛮高的。

4、试验物理驱动的AI求解点源麦克斯韦方程组

https://gitee.com/mindspore/mindscience/tree/master/MindElec/examples/physics_driven/time_domain_maxwell

cd ~/mindscience/MindElec/examples/physics_driven/time_domain_maxwell

改下GPU。

基于上个试验的教训，果然的修改配置，减少下epoch：

将epoch从6000降到100。

开始训练：

100个还是蛮快的。

同样的，虽然减少了epoch，但是loss确实在收敛之中，想必修炼6000次之后确实会成为六神装。

但是，张小白不能用自己的血汗钱去试，所以，这个时候关机走人是最好的解脱了。

这样子，基本上就完成了MindScience的MindElec作业。

（全文完，谢谢阅读）

【声明】本内容来自华为云开发者社区博主，不代表华为云及华为云开发者社区的观点和立场。转载时必须标注文章的来源（华为云社区）、文章链接、文章作者等基本信息，否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容，欢迎发送邮件进行举报，并提供相关证据，一经查实，本社区将立刻删除涉嫌侵权内容，举报邮箱： cloudbbs@huaweicloud.com

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。

确认取消

加入云驻计划，成为创作者

华为云周边好礼
免费体验产品
特殊身份标识
线下官方门票
内部专家零距离
与10000+优质创作者共同成长

立即加入