Simple speed up for the LDLT solver

3 minute read

I came across a piece of code designed to speed up the LDLT solver for real-time hair simulation. One optimization within this algorithm involves combining forward substitution with the LDLT decomposition. This technique proves particularly clever when the system is solved only once for each pairing of the left-hand side and right-hand side.

I’ve shared the modified C++ code for this here

Regular LDLT

In a standard LDLT solver, there are three main steps:

  1. LDLT decomposition
  2. Forward substitution
  3. Backward substitution

Once the LDLT decomposition is complete, it can be reused for other right-hand sides. Thus, only the forward and backward substitutions (steps 2 and 3) need to be performed subsequently.

vector<double> ldlt(vector<vector<double>> A, vector<double> b) {
    int n = A.size();
    vector<double> x(n, 0.0);
    vector<vector<double>> L(n, vector<double>(n, 0.0));
    vector<double> D(n, 0.0);
    vector<double> y(n, 0.0);

    // LDL decomposition
    for (int i = 0; i < n; i++) {
        double sum = 0.0;
        for (int k = 0; k < i; k++) {
            sum += L[i][k] * D[k] * L[i][k];
        }
        D[i] = A[i][i] - sum;
        L[i][i] = 1.0;


        for (int j = i + 1; j < n; j++) {
            double sum = 0.0;
            for (int k = 0; k < i; k++) {
                sum += L[j][k] * D[k] * L[i][k];
            }
            L[j][i] = (A[j][i] - sum) / D[i];
            L[i][j] = 0.0;
        }
    }

    // forward substitution
    for (int i = 0; i < n; i++) {
        double sum = 0.0;
        for (int k = 0; k < i; k++) {
            sum += L[i][k] * y[k];
        }
        y[i] = (b[i] - sum) / L[i][i];
    }

    // backward substitution
    for (int i = n - 1; i >= 0; i--) {
        double sum = 0.0;
        for (int k = i + 1; k < n; k++) {
            sum += L[k][i] * x[k];
        }
        x[i] = (y[i] - sum) / L[i][i];
    }

    return x;
}

Modified LDLT solver

The key observation is that during the forward substitution loop, the variables L[i][k] and L[i][i] are already computed in their corresponding loop within the LDLT decomposition, preceding the computation of L[i][j] and L[j][i] (in the second inner loop mentioned above). Hence, we can merge the first two for-loops.

vector<double> modified_ldlt(vector<vector<double>> A, vector<double> b) {
    int n = A.size();
    vector<double> x(n, 0.0);
    vector<vector<double>> L(n, vector<double>(n, 0.0));
    vector<double> D(n, 0.0);
    vector<double> y(n, 0.0);

    // LDL decomposition
    for (int i = 0; i < n; i++) {
        double sum = 0.0;
        for (int k = 0; k < i; k++) {
            sum += L[i][k] * D[k] * L[i][k];
        }
        D[i] = A[i][i] - sum;
        L[i][i] = 1.0;

        // forward substitution
        sum = 0.0;
        for (int k = 0; k < i; k++) {
            sum += L[i][k] * y[k];
        }
        y[i] = (b[i] - sum) / L[i][i];

        // compute L
        for (int j = i + 1; j < n; j++) {
            double sum = 0.0;
            for (int k = 0; k < i; k++) {
                sum += L[j][k] * D[k] * L[i][k];
            }
            L[j][i] = (A[j][i] - sum) / D[i];
            L[i][j] = 0.0;
        }
    }

    // backward substitution
    for (int i = n - 1; i >= 0; i--) {
        double sum = 0.0;
        for (int k = i + 1; k < n; k++) {
            sum += L[k][i] * x[k];
        }
        x[i] = (y[i] - sum) / L[i][i];
    }

    return x;
}

While the overall time complexity remains $O(n^2)$, such a performance enhancement can be crucial for real-time applications.

Leave a comment