Tuesday, January 15, 2008

LINQ to XML: Grouping and Aggregation Gotchas, Part II

LINQ to XML: Grouping and Aggregation Gotchas, Part I of January 7, 2008 bemoaned the lack of running sum examples for business document generation with LINQ to XML. LINQ to XML's powerful grouping and aggregation features make it a logical replacement for inscrutable XSLT 1.0 or 2.0 transforms.

It's very difficult to write a narrative on applying LINQ to XML aggregation to XML Infosets generated with VB 9.0's XML literal data type. Non-trivial examples appear to be better suited to demonstrate these techniques. As Julie Lerman observed in her LINQ - Learn by example post of January 7, 2008:

I have often likened learning LINQ as well as learning Entity SQL to learning T-SQL. Learning T-SQL has, for me, been an evolution. I still have to look up the Cast & Convert topic 99% of the time in the docs whenever I need to convert a datetime to a desired char format.

Even if there were 1001 examples, I know that there will always be something I want to do that I can't find an example for and will have to rely on gaining a better understanding of LINQ in order to accomplish it.

It's a common practice to roll up totals within an individual business document, such as the count and sum of an order's or invoice's extended line items, then add sales taxes and prepaid freight charges, if applicable, and provide a total order or invoice amount. The next rollup(s) might be by territory, customer or both, following by grand totals and averages for the entire Infoset.

Following is an example of the last order from an Infoset generated from hierarchical Northwind Customer, Order, and Order_Detail entities as well as many:1 associations with related Employee, Shipper and Product entities with rollups by Customer and grand totals for the first 100 orders with rollups in blue:

<Customers>
  ...
  <Customer>
    <CompanyName CustomerID="WHITC">White Clover Markets</CompanyName>
    <City>Seattle</City>
    <Country>USA</Country>
    <Orders>
      <Order CustomerID="WHITC">
        <OrderID>10344</OrderID>
        <SalesPerson EmployeeID="4">Margaret Peacock</SalesPerson>
        <OrderDate>11/1/1996</OrderDate>
        <RequiredDate>11/29/1996</RequiredDate>
        <ShippedDate>11/5/1996</ShippedDate>
        <ShipperName ShipperID="2">United Package</ShipperName>
        <ShipToName>White Clover Markets</ShipToName>
        <ShipToCountry>USA</ShipToCountry>
        <Details>
          <Order_Detail OrderID="10344">
            <Quantity>35</Quantity>
            <ProductID>4</ProductID>
            <ProductName>Chef Anton's Cajun Seasoning</ProductName>
            <SKU>48 - 6 oz jars</SKU>
            <UnitPrice>$17.60</UnitPrice>
            <Discount>0.0 %</Discount>
            <Extended>$616.00</Extended>
          </Order_Detail>
          <Order_Detail OrderID="10344">
            <Quantity>70</Quantity>
            <ProductID>8</ProductID>
            <ProductName>Northwoods Cranberry Sauce</ProductName>
            <SKU>12 - 12 oz jars</SKU>
            <UnitPrice>$32.00</UnitPrice>
            <Discount>25.0 %</Discount>
            <Extended>$1,680.00</Extended>
          </Order_Detail>
        </Details>
        <OrderItems>2</OrderItems>
        <OrderSubtotal>$2,296.00</OrderSubtotal>
        <PrepaidFreight>$23.29</PrepaidFreight>
        <OrderTotal>$2,319.29</OrderTotal>
      </Order>
      <Order CustomerID="WHITC">
        <OrderID>10269</OrderID>
        <SalesPerson EmployeeID="5">Steven Buchanan</SalesPerson>
        <OrderDate>7/31/1996</OrderDate>
        <RequiredDate>8/14/1996</RequiredDate>
        <ShippedDate>8/9/1996</ShippedDate>
        <ShipperName ShipperID="1">Speedy Express</ShipperName>
        <ShipToName>White Clover Markets</ShipToName>
        <ShipToCountry>USA</ShipToCountry>
        <Details>
          <Order_Detail OrderID="10269">
            <Quantity>60</Quantity>
            <ProductID>33</ProductID>
            <ProductName>Geitost</ProductName>
            <SKU>500 g</SKU>
            <UnitPrice>$2.00</UnitPrice>
            <Discount>5.0 %</Discount>
            <Extended>$114.00</Extended>
          </Order_Detail>
          <Order_Detail OrderID="10269">
            <Quantity>20</Quantity>
            <ProductID>72</ProductID>
            <ProductName>Mozzarella di Giovanni</ProductName>
            <SKU>24 - 200 g pkgs.</SKU>
            <UnitPrice>$27.80</UnitPrice>
            <Discount>5.0 %</Discount>
            <Extended>$528.20</Extended>
          </Order_Detail>
        </Details>
        <OrderItems>2</OrderItems>
        <OrderSubtotal>$642.20</OrderSubtotal>
        <PrepaidFreight>$4.56</PrepaidFreight>
        <OrderTotal>$646.76</OrderTotal>
      </Order>
    </Orders>
    <CustomerOrders>2</CustomerOrders>
    <OrdersSubtotal>$2,938.20</OrdersSubtotal>
    <OrdersFreight>$27.85</OrdersFreight>
    <OrdersTotal>$2,966.05</OrdersTotal>
  </Customer>
  <GrandTotals>
    <TotalOrders>100</TotalOrders>
    <TotalNetSales>$124,898.40</TotalNetSales>
    <AverageNetSale>$1,248.98</AverageNetSale>
    <TotalFreight>$5,746.95</TotalFreight>
    <AverageFreight>$57.47</AverageFreight>
    <TotalGrossSales>$130,645.40</TotalGrossSales>
  </GrandTotals>
</Customers>

Here's the code that created the preceding Infoset:

Private Sub OrdersByCustomer
    ' Create hierarchical Orders and Order Details by Customer Infoset using 1:many and 
    ' many:1 associations with order summary and orders by customer summary aggregations
    Dim CustomerOrders = _
    <Customers>
        <%= From c In CustomerList _
            Select _
            <Customer>
                <CompanyName CustomerID=<%= c.CustomerID %>>
                    <%= c.CompanyName %>
                </CompanyName>
                <City><%= c.City %></City>
                <Country><%= c.Country %></Country>
                <Orders>
                    <%= From o In c.Orders _
                        Order By o.OrderID Descending _
                        Select _
                        <Order CustomerID=<%= o.CustomerID %>>
                            <OrderID><%= o.OrderID %></OrderID>
                            <SalesPerson EmployeeID=<%= o.EmployeeID %>>
                                <%= o.Employee.FirstName & " " & o.Employee.LastName %>
                            </SalesPerson>
                            <OrderDate>
                                <%= o.OrderDate.Value.ToShortDateString() %>
                            </OrderDate>
                            <RequiredDate>
                                <%= o.RequiredDate.Value.ToShortDateString() %>
                            </RequiredDate>
                            <ShippedDate>
                                <%= o.ShippedDate.Value.ToShortDateString() %>
                            </ShippedDate>
                            <ShipperName ShipperID=<%= o.ShipVia %>>
                                <%= o.Shipper.CompanyName %>
                            </ShipperName>
                            <ShipToName><%= o.ShipName %></ShipToName>
                            <ShipToCountry><%= o.ShipCountry %></ShipToCountry>
                            <Details>
                                <%= From d In o.Order_Details _
                                    Let s = CDec(d.Quantity) * CDec(d.UnitPrice) * _
                                    (1 - CDec(d.Discount)) _
                                    Select _
                                    <Order_Detail OrderID=<%= d.OrderID %>>
                                        <Quantity><%= d.Quantity %></Quantity>
                                        <ProductID><%= d.ProductID %></ProductID>
                                        <ProductName>
                                            <%= d.Product.ProductName %>
                                        </ProductName>
                                        <SKU><%= d.Product.QuantityPerUnit %></SKU>
                                        <UnitPrice>
                                            <%= String.Format("{0:c}", d.UnitPrice) %>
                                        </UnitPrice>
                                        <Discount>
                                            <%= String.Format("{0:p1}", d.Discount) %>
                                        </Discount>
                                        <Extended>
                                            <%= String.Format("{0:c}", s) %>
                                        </Extended>
                                    </Order_Detail> %>
                            </Details>
                            <OrderItems><%= o.Order_Details.Count %></OrderItems>
                            <OrderSubtotal>
                                <%= String.Format("{0:c}", _
                                    (From t In o.Order_Details _
                                    Select CDec(t.Quantity) * _
                                    CDec(t.UnitPrice) * _
                                    (1 - CDec(t.Discount))).Sum()) %>
                            </OrderSubtotal>
                            <PrepaidFreight>
                                <%= String.Format("{0:c}", o.Freight.Value) %>
                            </PrepaidFreight>
                            <OrderTotal>
                                <%= String.Format("{0:c}", o.Freight + _
                                    (From t In o.Order_Details _
                                    Select CDec(t.Quantity) * CDec(t.UnitPrice) * _
                                    (1 - CDec(t.Discount))).Sum()) %></OrderTotal>
                        </Order> %>
                </Orders>
                <CustomerOrders><%= c.Orders.Count() %></CustomerOrders>
                <OrdersSubtotal>
                    <%= String.Format("{0:c}", Aggregate o In c.Orders Into _
                        Sum(Aggregate d In o.Order_Details _
                        Into Sum(d.Quantity * d.UnitPrice * (1 - d.Discount)))) %>
                </OrdersSubtotal>
                <OrdersFreight>
                    <%= String.Format("{0:c}", Aggregate f In c.Orders _
                        Into Sum(f.Freight)) %>
                </OrdersFreight>
                <OrdersTotal>
                    <%= String.Format("{0:c}", (From f In c.Orders _
                        Select f.Freight).Sum() + Aggregate o In c.Orders _
                        Into Sum(Aggregate d In o.Order_Details _
                        Into Sum(d.Quantity * d.UnitPrice * (1 - d.Discount)))) %>
                </OrdersTotal>
            </Customer> %>
        <GrandTotals>
            <TotalOrders><%= OrderList.Count %></TotalOrders>
            <TotalNetSales>
                <%= String.Format("{0:c}", Aggregate d In Order_DetailList _
                    Into Sum(d.Quantity * d.UnitPrice * (1 - d.Discount))) %>
            </TotalNetSales>
            <AverageNetSale>
                <%= String.Format("{0:c}", Aggregate o In OrderList _
                    Into Average(Aggregate d In o.Order_Details _
                    Into Sum(d.Quantity * d.UnitPrice * (1 - d.Discount)))) %>
            </AverageNetSale>
            <TotalFreight>
                <%= String.Format("{0:c}", Aggregate o In OrderList _
                    Into Sum(o.Freight)) %>
            </TotalFreight>
            <AverageFreight>
                <%= String.Format("{0:c}", Aggregate o In OrderList _
                    Into Average(o.Freight)) %>
            </AverageFreight>
            <TotalGrossSales>
                <%= String.Format("{0:c}", (Aggregate o In OrderList _
                    Into Sum(o.Freight)) + Aggregate o In OrderList _
                    Into Sum(Aggregate d In o.Order_Details _
                    Into Sum(d.Quantity * d.UnitPrice * (1 - d.Discount)))) %>
            </TotalGrossSales>
        </GrandTotals>
    </Customers>
End Sub

Notice the repetition of aggregation code in expressions for OrderSubtotal and OrderTotal, OrdersSubtotal and OrdersTotal, and TotalNetSales and TotalGrossSales elements. Let variables are scoped to their own expression hole, so they can't be used in successive calculations.

Despite the extra aggregation operations, performance is excellent. Writing the 4,614-element document from in-memory List<T> objects created by my LINQ In-Memory Object Generator (LIMOG) utility requires an average of only 84 ms. on my 2.8 MHz GHz dual-core development machine.

You can open in IE the complete 181-kB XML document from SkyDrive.

3 comments:

Anonymous said...

"...on my 2.8 MHz dual-core development machine..."

Two steps forward, one step back?

:)

Roger Jennings (--rj) said...

Thanks ... Fixed ;-)

(That's why the world needs editors).

--rj

Anonymous said...

Well. Hope I will never have to fix code like this. How can they even think of adding another language into a language? Stupid VB.NET Team