Using and interfacing with a batched solver.
 
  Using batched solvers
This example shows how to use Ginkgo batched solvers with data coming from an application. The "application" in this case is just a function in the example itself; nevertheless, the steps to be taken are shown.
A ‘batch’ here means a set of small linear systems that can be solved independently, but each system is too small to use an entire computing device. A requirement is that all the systems need to have the same sparsity pattern.  
The commented program
template <typename InputType>
auto unbatch(const InputType* batch_object)
{
    auto unbatched_mats =
        std::vector<std::unique_ptr<typename InputType::unbatch_type>>{};
    for (size_type b = 0; b < batch_object->get_num_batch_items(); ++b) {
        unbatched_mats.emplace_back(
            batch_object->create_const_view_for_item(b)->clone());
    }
    return unbatched_mats;
}
 
 
}  
 
'Application' structures and functions
Structure to simulate application data related to the linear systems to be solved.
We use raw pointers below to demonstrate how to handle the situation when the application only gives us raw pointers. Ideally, one should use Ginkgo's gko::Array class here. In this example, we assume that the data is in a format that can directly be given to a batch::matrix::Csr object.
Number of small systems in the batch.
Number of rows in each system.
Number of non-zeros in each system matrix.
Row pointers for one matrix
const index_type* row_ptrs;
Column indices of non-zeros for one matrix
const index_type* col_idxs;
Nonzero values for all matrices in the batch, concatenated
const value_type* all_values;
RHS vectors for all systems in the batch, concatenated
    const value_type* all_rhs;
};
 
 
/ *
 * Generates a batch of tridiagonal systems.
 *
 * @param nrows  Number of rows in each system.
 * @param nsystems  Number of systems in the batch.
 * @param exec  The device executor to use for the solver.
 * @note  Normally, the application may not deal with Ginkgo executors, nor do
 * we need it to. Here, we use the executor for backend-independent device
 * memory allocation. The application, for example, might assume Hip (for AMD
 * GPUs) and use `hipMalloc` directly.
 * /
ApplSysData appl_generate_system(const int nrows, const size_type nsystems,
                                 std::shared_ptr<gko::Executor> exec);
Deallocate application data.
void appl_clean_up(ApplSysData& appl_data, std::shared_ptr<gko::Executor> exec);
 
 
int main(int argc, char* argv[])
{
Print ginkgo version information
 
if (argc == 2 && (std::string(argv[1]) == "--help")) {
    std::cerr << "Usage: " << argv[0]
              << " [executor] [num_systems] [num_rows] [print_residuals] "
                 "[num_reps]"
              << std::endl;
    std::exit(-1);
}
static const version_info & get()
Returns an instance of version_info.
Definition version.hpp:139
  
Where do you want to run your solver ?
The gko::Executor class is one of the cornerstones of Ginkgo. Currently, we have support for an gko::OmpExecutor, which uses OpenMP multi-threading in most of its kernels, a gko::ReferenceExecutor, a single threaded specialization of the OpenMP executor, gko::CudaExecutor, gko::HipExecutor, gko::DpcppExecutor which runs the code on a NVIDIA, AMD and Intel GPUs, respectively. 
- Note
- With the help of C++, you see that you only ever need to change the executor and all the other functions/ routines within Ginkgo should automatically work and run on the executor with any other changes.
const auto executor_string = argc >= 2 ? argv[1] : "reference";
std::map<std::string, std::function<std::shared_ptr<gko::Executor>()>>
    exec_map{
        {"cuda",
         [] {
         }},
        {"hip",
         [] {
         }},
        {"dpcpp",
         [] {
         }},
        {"reference", [] { return gko::ReferenceExecutor::create(); }}};
static std::shared_ptr< CudaExecutor > create(int device_id, std::shared_ptr< Executor > master, bool device_reset, allocation_mode alloc_mode=default_cuda_alloc_mode, CUstream_st *stream=nullptr)
Creates a new CudaExecutor.
static std::shared_ptr< DpcppExecutor > create(int device_id, std::shared_ptr< Executor > master, std::string device_type="all", dpcpp_queue_property property=dpcpp_queue_property::in_order)
Creates a new DpcppExecutor.
static std::shared_ptr< HipExecutor > create(int device_id, std::shared_ptr< Executor > master, bool device_reset, allocation_mode alloc_mode=default_hip_alloc_mode, CUstream_st *stream=nullptr)
Creates a new HipExecutor.
static std::shared_ptr< OmpExecutor > create(std::shared_ptr< CpuAllocatorBase > alloc=std::make_shared< CpuAllocator >())
Creates a new OmpExecutor.
Definition executor.hpp:1396
executor where Ginkgo will perform the computation
const auto exec = exec_map.at(executor_string)();  
 
const size_type num_systems = argc >= 3 ? std::atoi(argv[2]) : 2;
const int num_rows = argc >= 4 ? std::atoi(argv[3]) : 32;  
Whether to print the residuals or not.
const bool print_residuals =
    argc >= 5 ? (std::string(argv[4]) == "true") : false;
The number of repetitions for the timing.
const int num_reps = argc >= 6 ? std::atoi(argv[5]) : 20;
 
Generate data
The "application" generates the batch of linear systems on the device
auto appl_sys = appl_generate_system(num_rows, num_systems, exec);
Create batch_dim object to describe the dimensions of the batch matrix.
auto batch_mat_size =
auto batch_vec_size =
A type representing the dimensions of a multidimensional batch object.
Definition batch_dim.hpp:27
A type representing the dimensions of a multidimensional object.
Definition dim.hpp:26
 
Use of application-allocated memory
We can either work on the existing memory allocated in the application, or we can copy it for the linear solve. Ginkgo expects the nonzero values for all the small matrices to be allocated contiguously, one matrix after the other.
    exec, num_systems * appl_sys.nnz, appl_sys.all_values);
                                                       appl_sys.row_ptrs);
                                                       appl_sys.col_idxs);
    exec, batch_mat_size, std::move(vals_view), std::move(colidxs_view),
    std::move(rowptrs_view)));
static detail::const_array_view< ValueType > const_view(std::shared_ptr< const Executor > exec, size_type size, const value_type *data)
Creates a constant (immutable) array from existing memory.
Definition array.hpp:384
detail::shared_type< OwningPointer > share(OwningPointer &&p)
Marks the object pointed to by p as shared.
Definition utils_helper.hpp:224
  
RHS and solution vectors
batch_stride object specifies the access stride within the individual matrices (vectors) in the batch. In this case, we specify a stride of 1 as the common value for all the matrices. Create RHS, again reusing application allocation
    exec, num_systems * num_rows, appl_sys.all_rhs);
auto b = vec_type::create_const(exec, batch_vec_size, std::move(b_view));
 Create initial guess as 0 and copy to device
auto x = vec_type::create(exec);
auto host_x = vec_type::create(exec->get_master(), batch_vec_size);
for (size_type isys = 0; isys < num_systems; isys++) {
    for (int irow = 0; irow < num_rows; irow++) {
    }
}
x->copy_from(host_x.get());
constexpr T zero()
Returns the additive identity for T.
Definition math.hpp:602
 
Create the batch solver factory
const real_type reduction_factor{1e-10};
Create a batched solver factory with relevant parameters.
auto solver =
    bicgstab::build()
        .with_max_iterations(500)
        .with_tolerance(reduction_factor)
        .with_tolerance_type(gko::batch::stop::tolerance_type::relative)
        .on(exec)
        ->generate(A);
 
Batch logger
Create a logger to obtain the iteration counts and "implicit" residual norms for every system after the solve.
std::shared_ptr<const gko::batch::log::BatchConvergence<value_type>>
static std::unique_ptr< BatchConvergence > create(const mask_type &enabled_events=gko::log::Logger::batch_solver_completed_mask)
Creates a convergence logger.
Definition batch_logger.hpp:92
 
Generate and solve
add the logger to the solver
solver->add_logger(logger);
Solve the batch system
detail::cloned_type< Pointer > clone(const Pointer &p)
Creates a unique clone of the object pointed to by p.
Definition utils_helper.hpp:173
 Warmup
for (int i = 0; i < 3; ++i) {
    x_clone->copy_from(x.get());
    solver->apply(b, x_clone);
}
 
double apply_time = 0.0;
for (int i = 0; i < num_reps; ++i) {
    x_clone->copy_from(x.get());
    exec->synchronize();
    std::chrono::steady_clock::time_point t1 =
        std::chrono::steady_clock::now();
    solver->apply(b, x_clone);
    exec->synchronize();
    std::chrono::steady_clock::time_point t2 =
        std::chrono::steady_clock::now();
    auto time_span =
        std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1);
    apply_time += time_span.count();
}
x->copy_from(x_clone.get());
This is not necessary, but one might want to remove the logger before the next solve using the same solver object.
solver->remove_logger(logger.get());
 
Check result
Compute norm of RHS on the device and automatically copy to host
auto host_b_norm = real_vec_type::create(exec->get_master(), norm_dim);
host_b_norm->fill(0.0);
 
b->compute_norm2(host_b_norm);
 we need constants on the device
auto one = vec_type::create(exec, norm_dim);
one->fill(1.0);
auto neg_one = vec_type::create(exec, norm_dim);
neg_one->fill(-1.0);
allocate and compute the residual
auto res = vec_type::create(exec, batch_vec_size);
res->copy_from(b);
A->apply(one, x, neg_one, res);
allocate and compute residual norm
auto host_res_norm = real_vec_type::create(exec->get_master(), norm_dim);
host_res_norm->fill(0.0);
res->compute_norm2(host_res_norm);
    exec->get_master(), &logger->get_residual_norm());
    exec->get_master(), &logger->get_num_iterations());
 
if (print_residuals) {
    std::cout << "Residual norm sqrt(r^T r):\n";
detail::temporary_clone< detail::pointee< Ptr > > make_temporary_clone(std::shared_ptr< const Executor > exec, Ptr &&ptr)
Creates a temporary_clone.
Definition temporary_clone.hpp:208
"unbatch" converts a batch object into a vector of objects of the corresponding single type, eg. batch::matrix::Dense --> std::vector<Dense>.
    auto unb_res = detail::unbatch(host_res_norm.get());
    auto unb_bnorm = detail::unbatch(host_b_norm.get());
    for (size_type i = 0; i < num_systems; ++i) {
        std::cout << " System no. " << i
                  << ": residual norm = " << unb_res[i]->at(0, 0)
                  << ", implicit residual norm = "
                  << host_log_resid->get_const_data()[i]
                  << ", iterations = "
                  << host_log_iters->get_const_data()[i] << std::endl;
        const real_type relresnorm =
            unb_res[i]->at(0, 0) / unb_bnorm[i]->at(0, 0);
        if (!(relresnorm <= reduction_factor)) {
            std::cout << "System " << i << " converged only to "
                      << relresnorm << " relative residual." << std::endl;
        }
    }
}
std::cout << "Solver type: "
          << "batch::bicgstab"
          << "\nMatrix size: " << A->get_common_size()
          << "\nNum batch entries: " << A->get_num_batch_items()
          << "\nEntire solve took: " << apply_time / num_reps << " seconds."
          << std::endl;
Ginkgo objects are cleaned up automatically; but the "application" still needs to clean up its data in this case.
    appl_clean_up(appl_sys, exec);
    return 0;
}
Generate the matrix and the vectors. This function emulates the generation of the data from the application.
ApplSysData appl_generate_system(const int nrows, const size_type nsystems,
                                 std::shared_ptr<gko::Executor> exec)
{
    const int nnz = nrows * 3 - 2;
    std::default_random_engine rgen(15);
    std::normal_distribution<real_type> distb(0.5, 0.1);
    std::vector<real_type> spacings(nsystems * nrows);
    std::generate(spacings.begin(), spacings.end(),
                  [&]() { return distb(rgen); });
 
    std::vector<value_type> allvalues(nnz * nsystems);
    for (size_type isys = 0; isys < nsystems; isys++) {
        allvalues.at(isys * nnz) = 2.0 / spacings.at(isys * nrows);
        allvalues.at(isys * nnz + 1) = -1.0;
        for (int irow = 0; irow < nrows - 2; irow++) {
            allvalues.at(isys * nnz + 2 + irow * 3) = -1.0;
            allvalues.at(isys * nnz + 2 + irow * 3 + 1) =
                2.0 / spacings.at(isys * nrows + irow + 1);
            allvalues.at(isys * nnz + 2 + irow * 3 + 2) = -1.0;
        }
        allvalues.at(isys * nnz + 2 + (nrows - 2) * 3) = -1.0;
        allvalues.at(isys * nnz + 2 + (nrows - 2) * 3 + 1) =
            2.0 / spacings.at((isys + 1) * nrows - 1);
        assert(isys * nnz + 2 + (nrows - 2) * 3 + 2 == (isys + 1) * nnz);
    }
 
    std::vector<index_type> rowptrs(nrows + 1);
    rowptrs.at(0) = 0;
    rowptrs.at(1) = 2;
    for (int i = 2; i < nrows; i++) {
        rowptrs.at(i) = rowptrs.at(i - 1) + 3;
    }
    rowptrs.at(nrows) = rowptrs.at(nrows - 1) + 2;
    assert(rowptrs.at(nrows) == nnz);
 
    std::vector<index_type> colidxs(nnz);
    colidxs.at(0) = 0;
    colidxs.at(1) = 1;
    const int nnz_per_row = 3;
    for (int irow = 1; irow < nrows - 1; irow++) {
        colidxs.at(2 + (irow - 1) * nnz_per_row) = irow - 1;
        colidxs.at(2 + (irow - 1) * nnz_per_row + 1) = irow;
        colidxs.at(2 + (irow - 1) * nnz_per_row + 2) = irow + 1;
    }
    colidxs.at(2 + (nrows - 2) * nnz_per_row) = nrows - 2;
    colidxs.at(2 + (nrows - 2) * nnz_per_row + 1) = nrows - 1;
    assert(2 + (nrows - 2) * nnz_per_row + 1 == nnz - 1);
 
    std::vector<value_type> allb(nrows * nsystems);
    for (size_type isys = 0; isys < nsystems; isys++) {
        const value_type bval = distb(rgen);
        std::fill(allb.begin() + isys * nrows,
                  allb.begin() + (isys + 1) * nrows, bval);
    }
 
    index_type* const row_ptrs = exec->alloc<index_type>(nrows + 1);
    exec->copy_from(exec->get_master().get(), 
static_cast<size_type>(nrows + 1),
                    rowptrs.data(), row_ptrs);
    index_type* const col_idxs = exec->alloc<index_type>(nnz);
    exec->copy_from(exec->get_master().get(), 
static_cast<size_type>(nnz),
                    colidxs.data(), col_idxs);
    value_type* const all_values = exec->alloc<value_type>(nsystems * nnz);
    exec->copy_from(exec->get_master().get(), nsystems * nnz, allvalues.data(),
                    all_values);
    value_type* const all_b = exec->alloc<value_type>(nsystems * nrows);
    exec->copy_from(exec->get_master().get(), nsystems * nrows, allb.data(),
                    all_b);
    return {nsystems, nrows, nnz, row_ptrs, col_idxs, all_values, all_b};
}
 
void appl_clean_up(ApplSysData& appl_data, std::shared_ptr<gko::Executor> exec)
{
std::size_t size_type
Integral type used for allocation quantities.
Definition types.hpp:89
In general, the application would control non-const pointers; the const casts below would not be needed.
    exec->free(const_cast<index_type*>(appl_data.row_ptrs));
    exec->free(const_cast<index_type*>(appl_data.col_idxs));
    exec->free(const_cast<value_type*>(appl_data.all_values));
    exec->free(const_cast<value_type*>(appl_data.all_rhs));
}
 
Results
The following is the expected result on the reference executor:
Solver type: batch::bicgstab
Matrix size: (32, 32)
Num batch entries: 2
Entire solve took: 2.18558e-05 seconds.
Comments about programming and debugging 
 
The plain program
 
 
#include <fstream>
#include <iostream>
#include <map>
#include <random>
#include <string>
 
 
#include <ginkgo/ginkgo.hpp>
 
 
using value_type = double;
using index_type = int;
 
 
namespace detail {
 
 
template <typename InputType>
auto unbatch(const InputType* batch_object)
{
    auto unbatched_mats =
        std::vector<std::unique_ptr<typename InputType::unbatch_type>>{};
    for (size_type b = 0; b < batch_object->get_num_batch_items(); ++b) {
        unbatched_mats.emplace_back(
            batch_object->create_const_view_for_item(b)->clone());
    }
    return unbatched_mats;
}
 
 
}  
 
 
struct ApplSysData {
    int nrows;
    int nnz;
    const index_type* row_ptrs;
    const index_type* col_idxs;
    const value_type* all_values;
    const value_type* all_rhs;
};
 
 
ApplSysData appl_generate_system(const int nrows, const size_type nsystems,
                                 std::shared_ptr<gko::Executor> exec);
 
void appl_clean_up(ApplSysData& appl_data, std::shared_ptr<gko::Executor> exec);
 
 
int main(int argc, char* argv[])
{
 
    if (argc == 2 && (std::string(argv[1]) == "--help")) {
        std::cerr << "Usage: " << argv[0]
                  << " [executor] [num_systems] [num_rows] [print_residuals] "
                     "[num_reps]"
                  << std::endl;
        std::exit(-1);
    }
 
    const auto executor_string = argc >= 2 ? argv[1] : "reference";
    std::map<std::string, std::function<std::shared_ptr<gko::Executor>()>>
        exec_map{
            {"cuda",
             [] {
             }},
            {"hip",
             [] {
             }},
            {"dpcpp",
             [] {
             }},
            {"reference", [] { return gko::ReferenceExecutor::create(); }}};
 
    const auto exec = exec_map.at(executor_string)();  
 
    const size_type num_systems = argc >= 3 ? std::atoi(argv[2]) : 2;
 
    const int num_rows = argc >= 4 ? std::atoi(argv[3]) : 32;  
    const bool print_residuals =
        argc >= 5 ? (std::string(argv[4]) == "true") : false;
    const int num_reps = argc >= 6 ? std::atoi(argv[5]) : 20;
    auto appl_sys = appl_generate_system(num_rows, num_systems, exec);
    auto batch_mat_size =
    auto batch_vec_size =
        exec, num_systems * appl_sys.nnz, appl_sys.all_values);
                                                           appl_sys.row_ptrs);
                                                           appl_sys.col_idxs);
        exec, batch_mat_size, std::move(vals_view), std::move(colidxs_view),
        std::move(rowptrs_view)));
        exec, num_systems * num_rows, appl_sys.all_rhs);
    auto b = vec_type::create_const(exec, batch_vec_size, std::move(b_view));
    auto x = vec_type::create(exec);
    auto host_x = vec_type::create(exec->get_master(), batch_vec_size);
    for (size_type isys = 0; isys < num_systems; isys++) {
        for (int irow = 0; irow < num_rows; irow++) {
        }
    }
    x->copy_from(host_x.get());
 
    const real_type reduction_factor{1e-10};
        bicgstab::build()
            .with_max_iterations(500)
            .with_tolerance(reduction_factor)
            .with_tolerance_type(gko::batch::stop::tolerance_type::relative)
            .on(exec)
            ->generate(A);
 
    std::shared_ptr<const gko::batch::log::BatchConvergence<value_type>>
 
 
    for (int i = 0; i < 3; ++i) {
        x_clone->copy_from(x.get());
    }
 
    double apply_time = 0.0;
    for (int i = 0; i < num_reps; ++i) {
        x_clone->copy_from(x.get());
        exec->synchronize();
        std::chrono::steady_clock::time_point t1 =
            std::chrono::steady_clock::now();
        exec->synchronize();
        std::chrono::steady_clock::time_point t2 =
            std::chrono::steady_clock::now();
        auto time_span =
            std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1);
        apply_time += time_span.count();
    }
    x->copy_from(x_clone.get());
    solver->remove_logger(logger.get());
 
 
    auto host_b_norm = real_vec_type::create(exec->get_master(), norm_dim);
    host_b_norm->fill(0.0);
 
    b->compute_norm2(host_b_norm);
    auto one = vec_type::create(exec, norm_dim);
 
    auto neg_one = vec_type::create(exec, norm_dim);
    neg_one->fill(-1.0);
    auto res = vec_type::create(exec, batch_vec_size);
    res->copy_from(b);
    A->apply(one, x, neg_one, res);
    auto host_res_norm = real_vec_type::create(exec->get_master(), norm_dim);
    host_res_norm->fill(0.0);
    res->compute_norm2(host_res_norm);
        exec->get_master(), &logger->get_residual_norm());
        exec->get_master(), &logger->get_num_iterations());
 
    if (print_residuals) {
        std::cout << "Residual norm sqrt(r^T r):\n";
        auto unb_res = detail::unbatch(host_res_norm.get());
        auto unb_bnorm = detail::unbatch(host_b_norm.get());
        for (size_type i = 0; i < num_systems; ++i) {
            std::cout << " System no. " << i
                      << ": residual norm = " << unb_res[i]->at(0, 0)
                      << ", implicit residual norm = "
                      << host_log_resid->get_const_data()[i]
                      << ", iterations = "
                      << host_log_iters->get_const_data()[i] << std::endl;
            const real_type relresnorm =
                unb_res[i]->at(0, 0) / unb_bnorm[i]->at(0, 0);
            if (!(relresnorm <= reduction_factor)) {
                std::cout << "System " << i << " converged only to "
                          << relresnorm << " relative residual." << std::endl;
            }
        }
    }
    std::cout << "Solver type: "
              << "batch::bicgstab"
              << "\nMatrix size: " << A->get_common_size()
              << "\nNum batch entries: " << A->get_num_batch_items()
              << "\nEntire solve took: " << apply_time / num_reps << " seconds."
              << std::endl;
 
    appl_clean_up(appl_sys, exec);
    return 0;
}
 
 
ApplSysData appl_generate_system(const int nrows, const size_type nsystems,
                                 std::shared_ptr<gko::Executor> exec)
{
    const int nnz = nrows * 3 - 2;
    std::default_random_engine rgen(15);
    std::normal_distribution<real_type> distb(0.5, 0.1);
    std::vector<real_type> spacings(nsystems * nrows);
    std::generate(spacings.begin(), spacings.end(),
                  [&]() { return distb(rgen); });
 
    std::vector<value_type> allvalues(nnz * nsystems);
    for (size_type isys = 0; isys < nsystems; isys++) {
        allvalues.at(isys * nnz) = 2.0 / spacings.at(isys * nrows);
        allvalues.at(isys * nnz + 1) = -1.0;
        for (int irow = 0; irow < nrows - 2; irow++) {
            allvalues.at(isys * nnz + 2 + irow * 3) = -1.0;
            allvalues.at(isys * nnz + 2 + irow * 3 + 1) =
                2.0 / spacings.at(isys * nrows + irow + 1);
            allvalues.at(isys * nnz + 2 + irow * 3 + 2) = -1.0;
        }
        allvalues.at(isys * nnz + 2 + (nrows - 2) * 3) = -1.0;
        allvalues.at(isys * nnz + 2 + (nrows - 2) * 3 + 1) =
            2.0 / spacings.at((isys + 1) * nrows - 1);
        assert(isys * nnz + 2 + (nrows - 2) * 3 + 2 == (isys + 1) * nnz);
    }
 
    std::vector<index_type> rowptrs(nrows + 1);
    rowptrs.at(0) = 0;
    rowptrs.at(1) = 2;
    for (int i = 2; i < nrows; i++) {
        rowptrs.at(i) = rowptrs.at(i - 1) + 3;
    }
    rowptrs.at(nrows) = rowptrs.at(nrows - 1) + 2;
    assert(rowptrs.at(nrows) == nnz);
 
    std::vector<index_type> colidxs(nnz);
    colidxs.at(0) = 0;
    colidxs.at(1) = 1;
    const int nnz_per_row = 3;
    for (int irow = 1; irow < nrows - 1; irow++) {
        colidxs.at(2 + (irow - 1) * nnz_per_row) = irow - 1;
        colidxs.at(2 + (irow - 1) * nnz_per_row + 1) = irow;
        colidxs.at(2 + (irow - 1) * nnz_per_row + 2) = irow + 1;
    }
    colidxs.at(2 + (nrows - 2) * nnz_per_row) = nrows - 2;
    colidxs.at(2 + (nrows - 2) * nnz_per_row + 1) = nrows - 1;
    assert(2 + (nrows - 2) * nnz_per_row + 1 == nnz - 1);
 
    std::vector<value_type> allb(nrows * nsystems);
    for (size_type isys = 0; isys < nsystems; isys++) {
        const value_type bval = distb(rgen);
        std::fill(allb.begin() + isys * nrows,
                  allb.begin() + (isys + 1) * nrows, bval);
    }
 
    index_type* const row_ptrs = exec->alloc<index_type>(nrows + 1);
    exec->copy_from(exec->get_master().get(), 
static_cast<size_type>(nrows + 1),
                    rowptrs.data(), row_ptrs);
    index_type* const col_idxs = exec->alloc<index_type>(nnz);
    exec->copy_from(exec->get_master().get(), 
static_cast<size_type>(nnz),
                    colidxs.data(), col_idxs);
    value_type* const all_values = exec->alloc<value_type>(nsystems * nnz);
    exec->copy_from(exec->get_master().get(), nsystems * nnz, allvalues.data(),
                    all_values);
    value_type* const all_b = exec->alloc<value_type>(nsystems * nrows);
    exec->copy_from(exec->get_master().get(), nsystems * nrows, allb.data(),
                    all_b);
    return {nsystems, nrows, nnz, row_ptrs, col_idxs, all_values, all_b};
}
 
void appl_clean_up(ApplSysData& appl_data, std::shared_ptr<gko::Executor> exec)
{
    exec->free(const_cast<index_type*>(appl_data.row_ptrs));
    exec->free(const_cast<index_type*>(appl_data.col_idxs));
    exec->free(const_cast<value_type*>(appl_data.all_values));
    exec->free(const_cast<value_type*>(appl_data.all_rhs));
}
MultiVector stores multiple vectors in a batched fashion and is useful for batched operations.
Definition batch_multi_vector.hpp:59
Csr is a general sparse matrix format that stores the column indices for each nonzero entry and a cum...
Definition batch_csr.hpp:53
BiCGSTAB or the Bi-Conjugate Gradient-Stabilized is a Krylov subspace solver.
Definition batch_bicgstab.hpp:51
@ solver
Solver events.
Definition profiler_hook.hpp:34
constexpr T one()
Returns the multiplicative identity for T.
Definition math.hpp:630
typename detail::remove_complex_s< T >::type remove_complex
Obtain the type which removed the complex of complex/scalar type or the template parameter of class b...
Definition math.hpp:260