Improve documentation and support for saving in batches

tom.monnier · March 18, 2025, 9:17am

The documentation has a small section on save performance

The low-level stuff should be part of the framework. I propose to introduce a BatchSavingRepository for this purpose or adding the functionality to an existing one.

void saveInBatches(List<Customer> entities) {
    SaveContext saveContext = new SaveContext().setDiscardSaved(true);
    for (int i = 0; i < entities.size(); i++) {
        saveContext.saving(entities.get(i));
        // save by 100 instances
        if ((i + 1) % 100 == 0 || i == entities.size() - 1) {
            dataManager.save(saveContext);
            saveContext = new SaveContext().setDiscardSaved(true);
        }
    }
}

krivopustov · March 19, 2025, 7:31am

The provided code looks trivial to implement in the project. And it gives better understanding of saving options, which could help in other scenarios.

So I’m not sure what is better here - to catch a fish for the user or to show them how to catch it themselves.

From your experience, how often in you projects you need to save a lot of data in one go?

Regards,
Konstantin

tom.monnier · March 19, 2025, 4:22pm

In my past projects, I needed it a few times but never actually solved it. This was a cuba based application where batch was not available, or at least not documented. But it was an internal application so we could get away by increasing heap space right up until it was just enough.

In my new project that is started just now I also have the need.
All those use cases are/were due to import functionality. (excel, rest, csv)

In my new project the first time I ran against it was also import. Second time (today) is due to copying of a variation of that model. In our specific application we just have some one-to-many relations with many children. It’s really a consequence of the business model.

I even fear they may become so big we need to result to multitransactional model in the end.

For now I implemented my own BatchSavingRepositoryBean with the given default implementation and call that from my specific repo’s.

So for me it’s natural to make it part of infrastructure.


@NoRepositoryBean
public interface BatchSavingRepository<T extends BaseEntity, ID> extends JmixDataRepository<T, ID> {
    default void saveInBatch(List<T> entities, int batchSize) {
        SaveContext saveContext = new SaveContext().setDiscardSaved(true);
        for (int i = 0; i < entities.size(); i++) {
            saveContext.saving(entities.get(i));
            // save by batchSize instances
            if ((i + 1) % batchSize == 0 || i == entities.size() - 1) {
                getDataManager().save(saveContext);
                saveContext = new SaveContext().setDiscardSaved(true);
            }
        }
    }
}

@Validated
@Repository
public interface MySpecificRepsotiory extends JmixDataRepository<MySpecificEntity, UUID>, BatchSavingRepository<MySpecificEntity, UUID> {
    default void saveEntriesInBatch(List<@Valid MySpecificEntity> entries) {
        saveInBatch(entries, 100);
    }
}

krivopustov · March 25, 2025, 7:36am

Thank you for sharing your experience. The code you provided is also a good example of using repositories as an intermediate data access layer.

We’ll think about adding a “standard” API for saving large number of entities: Add public API for efficient saving of many entities · Issue #4307 · jmix-framework/jmix

Regards,
Konstantin