save method of CRUDRepository is very slow?(CRUDRepository 的保存方法很慢?)
问题描述
我想在我的 neo4j 数据库中存储一些数据.我为此使用 spring-data-neo4j.
i want to store some data in my neo4j database. i use spring-data-neo4j for that.
我的代码如下:
for (int i = 0; i < newRisks.size(); i++) {
myRepository.save(newRisks.get(i));
System.out.println("saved " + newRisks.get(i).name);
}
我的 newRisks 数组包含大约 60000 个对象和 60000 个边.每个节点和边都有一个属性.此循环的持续时间约为 15 - 20 分钟,这正常吗?我使用 Java VisualVM 来搜索一些瓶颈,但我的平均 CPU 使用率为 10 - 25%(4 个内核),而且我的堆还不到一半.
My newRisks-array contains circa 60000 objects and 60000 edges. Every node and edge has one property. The duration of this loop is circa 15 - 20 minutes, is this normal? I used Java VisualVM to search some bottlenecks, but my average CPU usage was 10 - 25% (of 4 cores) and my heap was less than half full.
有什么办法可以提升这个操作?
There are any options to boost up this operation?
额外的是,在第一次调用 myRepository.save(newRisks.get(i));
时 jvm 在第一次输出前几分钟进入睡眠状态来了
additional is, on the first call of myRepository.save(newRisks.get(i));
the jvm falling assleep fpr some minutes before the first output is comming
第二次
类别风险:
@NodeEntity
public class Risk {
//...
@Indexed
public String name;
@RelatedTo(type = "CHILD", direction = Direction.OUTGOING)
Set<Risk> risk = new HashSet<Risk>();
public void addChild(Risk child) {
risk.add(child);
}
//...
}
制造风险:
@Autowired
private Repository myRepository;
@Transactional
public Collection<Risk> makeSomeRisks() {
ArrayList<Risk> newRisks = new ArrayList<Risk>();
newRisks.add(new Risk("Root"));
for (int i = 0; i < 60000; i++) {
Risk risk = new Risk("risk " + (i + 1));
newRisks.get(0).addChild(risk);
newRisks.add(risk);
}
for (int i = 0; i < newRisks.size(); i++) {
myRepository.save(newRisks.get(i));
}
return newRisks;
}
推荐答案
这里的问题是您正在使用不是为此而设计的 API 进行批量插入.
The problem here is that you are doing mass-inserts with an API that is not intended for that.
您创建一个 Risk 和 60k 个子项,您首先保存根,该根同时也保留 60k 个子项(并创建关系).这就是为什么第一次保存需要这么长时间.然后你又救了孩子们.
You create a Risk and 60k children, you first save the root which also persists the 60k children at the same time (and creates the relationships). That's why the first save takes so long. And then you save the children again.
有一些解决方案可以通过 SDN 加快速度.
There are some solutions to speed it up with SDN.
不要使用集合的方式进行大量插入,持久化参与者并使用 template.createRelationshipBetween(root, child, "CHILD",false);
don't use the collection approach for mass inserts, persist both participants and use template.createRelationshipBetween(root, child, "CHILD",false);
先持久化子对象,然后将所有持久化的子对象添加到根对象并持久化
persist the children first then add all the persisted children to the root object and persist that
正如您所做的那样,使用 Neo4j-Core API,但调用 template.postEntityCreation(node,Risk.class) 以便您可以通过 SDN 访问实体.然后你还必须自己索引实体 (db.index.forNodes("Risk").add(node,"name",name);) (或使用 neo4j core-api 自动索引,但这不是与 SDN 兼容).
As you did, use the Neo4j-Core API but call template.postEntityCreation(node,Risk.class) so that you can access the entities via SDN. Then you also have to index the entities on your own (db.index.forNodes("Risk").add(node,"name",name);) (or use the neo4j core-api auto-index, but that's not compatible with SDN).
无论是 core-api 还是 SDN,您都应该使用大约 10-20k 个节点/rels 的 tx-size 以获得最佳性能
Regardless with the core-api or SDN you should use tx-sizes of around 10-20k nodes/rels for best performance
这篇关于CRUDRepository 的保存方法很慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:CRUDRepository 的保存方法很慢?


- Java包名称中单词分隔符的约定是什么? 2022-01-01
- value & 是什么意思?0xff 在 Java 中做什么? 2022-01-01
- 从 finally 块返回时 Java 的奇怪行为 2022-01-01
- C++ 和 Java 进程之间的共享内存 2022-01-01
- Safepoint+stats 日志,输出 JDK12 中没有 vmop 操作 2022-01-01
- Spring Boot连接到使用仲裁器运行的MongoDB副本集 2022-01-01
- 将log4j 1.2配置转换为log4j 2配置 2022-01-01
- Eclipse 插件更新错误日志在哪里? 2022-01-01
- 如何使用WebFilter实现授权头检查 2022-01-01
- Jersey REST 客户端:发布多部分数据 2022-01-01