The adas.utils package

Design of Experiments, the tidy way

Paolo Bosetti

University of Trento

rtug::25.1, 2025-05-30

Contents

  • Design of Experiments (DoE) and factorial plans
  • How to do DoE in vanilla R
  • How to do DoE with the adas.utils package
  • Alternatives
  • Future developments

Design of Experiments and factorial plans

  • Design of Experiments (DoE) is a collection of statistical techniques to plan and analyze industrial experiments
  • Predictors are typically many (10 or more), and can be continuous or categorical
  • Often a first-order model is enough
  • Fitting a response surface on a multidimensional grid can be costly

DoE aims at getting the most information from a minimum of experiments

DoE in brief

  • Plan for a grid of treatments (factorial plan) in the \(n\)-hyperspace, where \(n\) is the number of predictors, or process parameters
  • The FP can be optimized, sacrificing completeness for efficiency (Fractional Factorial Plans)
  • The FP must be randomized, to reduce bias due to systematic errors
  • The FP can be non-replicated, to further reduce costs (Daniel’s method)
  • The FP can be augmented, to add new treatments to an existing plan (Augmented Factorial Plans)
  • We use coded units (i.e. normalized to \([-1,1]\))

2 factors CCD

3 factors CCD

Simple example in vanilla R

Create a non-replicated full factorial plan with three factors, two levels each (\(2^3\)):

# Make the grid
fp <- expand.grid(
  A=c(-1,1), 
  B=c(-1,1), 
  C=c(-1,1), 
  Y=NA
)

# Add orders
fp$StdOrder <- 1:nrow(fp)
fp$RunOrder <- sample(nrow(fp))
fp
   A  B  C  Y StdOrder RunOrder
1 -1 -1 -1 NA        1        5
2  1 -1 -1 NA        2        1
3 -1  1 -1 NA        3        7
4  1  1 -1 NA        4        3
5 -1 -1  1 NA        5        2
6  1 -1  1 NA        6        6
7 -1  1  1 NA        7        8
8  1  1  1 NA        8        4

Then save it (typ. as CSV), perform the experiments, fill the Y yield column, and load it back for the analysis

Simple example in vanilla R

For a fractional factorial plan, we reject one half of the FP according to a defining relationship: \(I=ABC\), which can be transformed as \(C=AB\)

We remove rows where the sign of \(C\) is the product of \(A\) and \(B\):

# Extract the fraction where C=AB
ffp <- fp[fp$C==fp$A*fp$B, ]

# Add orders
fp$StdOrder <- 1:nrow(fp)
ffp$RunOrder <- sample(nrow(ffp))
ffp
   A  B  C  Y StdOrder RunOrder
2  1 -1 -1 NA        2        4
3 -1  1 -1 NA        3        3
5 -1 -1  1 NA        5        1
8  1  1  1 NA        8        2

But mind you! for this to work, columns A, B, and C must NOT be factors! (in the R sense)

Remark on defining relationships

It holds the signs algebra: \(X\cdot X=I,~I\cdot X = X\), thus \(CI = ABCC~\rightarrow~C=AB\)

Simple example in vanilla R

Analyzing the FP is mostly a matter of:

  • defining a linear model, Y~A*B*C
  • using lm() to fit the model
  • using residuals() to check the residuals for normality and patterns
  • using anova() to analyze the model
  • simplify the model if necessary

But if the FP is non-replicated, you can’t fit a model unless you remove some terms from the general linear model Y~A*B*C. To do so, Daniel’s method suggests to make a Q-Q plot of the effects: not straightforward

Simple example in vanilla R

As an example for a \(2^4\) FP, the Daniel’s Q-Q plot of the effects can be obtained by:

# build a full linear model:
fp.lm <- lm(Y ~ A*B*C*D, data=fp)

# prepare plot data:
len     <- length(fp.lm$effects)
effects <- fp.lm$effects[2:len]

# Q-Q plot:
qq      <- qqnorm(effects)
qqline(effects)

# add names:
text(qq$x, qq$y, 
     labels=names(effects))

Simple example in vanilla R

Problems:

  • difficult to manage scaled units vs. non scaled units
  • factor names aren’t mnemonic (which parameter is represented by F?)
  • fractioning an FP is not trivial
  • Daniel’s method is tricky and repetitive
  • augmenting a plan is not trivial when the number of factors is 4 or more
  • everything is not very tidy (in the sense of tidyverse)

Enter adas.utils package

The package is available on CRAN:

install.packages("adas.utils")
library(adas.utils)

Don’t forget to look at the vignette:

vignette("adas.utils")

Plain FPs

Base \(2\cdot 2^2\) FP:

library(adas.utils)

# Two factors, two replicas
fp_design_matrix(2, rep=2)
 Factorial Plan Design Matrix
 Defining Relationship:  ~ A * B 
 Factors:  A B 
 Levels:  -1 1 
 Fraction:  NA 
 Type:  plain 
 
# A tibble: 8 × 7
  StdOrder RunOrder .treat  .rep     A     B Y    
     <int>    <int> <chr>  <int> <dbl> <dbl> <lgl>
1        1        8 (1)        1    -1    -1 NA   
2        2        2 a          1     1    -1 NA   
3        3        6 b          1    -1     1 NA   
4        4        5 ab         1     1     1 NA   
5        5        4 (1)        2    -1    -1 NA   
6        6        1 a          2     1    -1 NA   
7        7        7 b          2    -1     1 NA   
8        8        3 ab         2     1     1 NA   

Note

Note the Yates’ treatment names in .treat column

Plain FPs, with named factors

We can tie factors with corresponding parameter names:

# Two factors, two replicas, 
# with names
fp_design_matrix(2, rep=2) %>%
  fp_add_names(
    A="Temperature",
    B="Pressure"
  )
 Factorial Plan Design Matrix
 Defining Relationship:  ~ A * B 
 Factors:  A B 
 Levels:  -1 1 
 Fraction:  NA 
 Type:  plain 
 Factor names:
    A: Temperature
    B: Pressure
 
# A tibble: 8 × 7
  StdOrder RunOrder .treat  .rep     A     B Y    
     <int>    <int> <chr>  <int> <dbl> <dbl> <lgl>
1        1        1 (1)        1    -1    -1 NA   
2        2        5 a          1     1    -1 NA   
3        3        7 b          1    -1     1 NA   
4        4        4 ab         1     1     1 NA   
5        5        6 (1)        2    -1    -1 NA   
6        6        8 a          2     1    -1 NA   
7        7        2 b          2    -1     1 NA   
8        8        3 ab         2     1     1 NA   

Plain FPs, with named factors and actual scales

Actual scales can be added for reference:

# Two factors, two replicas, 
# with names and scales
fp_design_matrix(2, rep=2) %>% 
  fp_add_names(
    A="Temperature", 
    B="Pressure"
  ) %>% 
  fp_add_scale(
    A=c(20, 25), 
    B=c(75, 125), 
    suffix=".scaled"
  ) %>% 
  # Just to keep output compact 😎 
  select(-c(StdOrder, RunOrder))
 Factorial Plan Design Matrix
 Defining Relationship:  ~ A * B 
 Factors:  A B 
 Levels:  -1 1 
 Fraction:  NA 
 Type:  plain 
 Scales suffix: .scaled
 Scaled factors:
    A.scaled: [20, 25]
    B.scaled: [75, 125]
 Factor names:
    A: Temperature
    B: Pressure
 
# A tibble: 8 × 7
  .treat  .rep     A     B A.scaled B.scaled Y    
  <chr>  <int> <dbl> <dbl>    <dbl>    <dbl> <lgl>
1 (1)        1    -1    -1       20       75 NA   
2 a          1     1    -1       25       75 NA   
3 b          1    -1     1       20      125 NA   
4 ab         1     1     1       25      125 NA   
5 (1)        2    -1    -1       20       75 NA   
6 a          2     1    -1       25       75 NA   
7 b          2    -1     1       20      125 NA   
8 ab         2     1     1       25      125 NA   

Augmented FPs

We can augment a \(2^n\) FP with a central treatment:

# Three factors, scaled
fp_design_matrix(3) %>%
  fp_add_scale(B=c(10, 20)) %>% 
  
  # Augment with a central treatment
  fp_augment_center(rep=4) %>% 
  
  # Just to keep output compact 😎 
  slice_tail(n=6)
 Factorial Plan Design Matrix
 Defining Relationship:  ~ A * B * C 
 Factors:  A B C 
 Levels:  -1 1 
 Fraction:  NA 
 Type:  centered 
 Scales suffix: _s
 Scaled factors:
    B_s: [10, 20]
 
# A tibble: 6 × 9
  StdOrder RunOrder .treat  .rep     A     B     C   B_s Y    
     <int>    <int> <chr>  <int> <dbl> <dbl> <dbl> <dbl> <lgl>
1        7        2 bc         1    -1     1     1    20 NA   
2        8        7 abc        1     1     1     1    20 NA   
3        9       12 center     1     0     0     0    15 NA   
4       10       10 center     2     0     0     0    15 NA   
5       11       11 center     3     0     0     0    15 NA   
6       12        9 center     4     0     0     0    15 NA   

Augmented FPs

And then further augment the FP with axial treatments to get a rotatable Composite Centered Design (CCD):

fp_design_matrix(2) %>%
  fp_add_scale(
    A=c(7,18), 
    B=c(10, 20)) %>%
  fp_augment_center(rep=1) %>%
  # Also augment with axial treatments
  fp_augment_axial(rep=1) %>% 
  # Just to keep output compact 😎 
  select(-RunOrder) %>% 
  as_tibble() # don't print header
# A tibble: 9 × 8
  StdOrder .treat  .rep     A     B   A_s   B_s Y    
     <int> <chr>  <int> <dbl> <dbl> <dbl> <dbl> <lgl>
1        1 (1)        1 -1    -1     7    10    NA   
2        2 a          1  1    -1    18    10    NA   
3        3 b          1 -1     1     7    20    NA   
4        4 ab         1  1     1    18    20    NA   
5        5 center     1  0     0    12.5  15    NA   
6        6 axial      1  0    -1.41 12.5   7.93 NA   
7        7 axial      1 -1.41  0     4.72 15    NA   
8        8 axial      1  1.41  0    20.3  15    NA   
9        9 axial      1  0     1.41 12.5  22.1  NA   

Note

Automatically scaling CCDs helps a lot in correctly defining process settings for each treatment

Augmented FPs

And then further augment the FP with axial treatments to get a rotatable Centered Composite Design (CCD):

fp_design_matrix(2) %>%
  fp_add_scale(
    A=c(7,18), 
    B=c(10, 20)) %>%
  fp_augment_center(rep=1) %>%
  fp_augment_axial(rep=1) %>% 
  # Make a plot
  ggplot(aes(x=A, y=B)) + 
  geom_circle(
    aes(x0=0, y0=0, r=sqrt(2)), 
    lty=2, color=gray(0.5)) +
  geom_label(aes(label=.treat)) +
  coord_fixed(xlim=c(-1.5,1.5)*6/4)

Note

Of course it works on higher dimensions as well

Fractional Factorial Plans

We can fraction a \(2^n\) FP by adding subsequent defining relationships:

fp_design_matrix(5) %>% 
  fp_fraction(~A*B*C*D) %>% 
  fp_fraction(~B*C*D*E) %>%
  select(-c(StdOrder,RunOrder,.rep))
 Factorial Plan Design Matrix
 Defining Relationship:  ~ A * B * C * D * E 
 Factors:  A B C D E 
 Levels:  -1 1 
 Fraction:  I=ABCD I=BCDE 
 Type:  fractional 
 
# A tibble: 8 × 9
  .treat     A     B     C     D     E Y      ABCD  BCDE
  <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl> <dbl>
1 (1)       -1    -1    -1    -1    -1 NA        1     1
2 bc        -1     1     1    -1    -1 NA        1     1
3 bd        -1     1    -1     1    -1 NA        1     1
4 cd        -1    -1     1     1    -1 NA        1     1
5 abe        1     1    -1    -1     1 NA        1     1
6 ace        1    -1     1    -1     1 NA        1     1
7 ade        1    -1    -1     1     1 NA        1     1
8 abcde      1     1     1     1     1 NA        1     1

Save design matrix

You can save the design matrix as CSV file (for collecting experimental data), then load it back into the original FP object (thus preserving attributes):

fp <- fp_design_matrix(4) %>% 
  fp_fraction(~A*B*C*D) %>% 
  fp_write_csv("fp.csv")

fp <- fp %>% 
  fp_read_csv("fp.csv")

fp %>% select(-RunOrder)
 Factorial Plan Design Matrix
 Defining Relationship:  ~ A * B * C * D 
 Factors:  A B C D 
 Levels:  -1 1 
 Fraction:  I=ABCD 
 Type:  fractional 
 
# A tibble: 8 × 9
  StdOrder .treat  .rep     A     B     C     D Y      ABCD
     <int> <chr>  <int> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
1        1 (1)        1    -1    -1    -1    -1 NA        1
2        4 ab         1     1     1    -1    -1 NA        1
3        6 ac         1     1    -1     1    -1 NA        1
4        7 bc         1    -1     1     1    -1 NA        1
5       10 ad         1     1    -1    -1     1 NA        1
6       11 bd         1    -1     1    -1     1 NA        1
7       13 cd         1    -1    -1     1     1 NA        1
8       16 abcd       1     1     1     1     1 NA        1

Note

The saved CSV file has a commented header with FFP details (e.g. defining relationships, factors names, scales, etc.)

Alias structures

Fractioning a FP creates alias structures: the adas.utils package can help you with that too:

fp_alias_matrix(~A*B*C, ~B*C*D)
Defining relationships:
 I=ABC I=BCD I=AD 

     A B AB C AC BC ABC D AD BD ABD CD ACD BCD ABCD
A    0 0  0 0  0  1   0 3  0  0   0  0   0   0    2
B    0 0  0 0  1  0   0 0  0  0   3  2   0   0    0
AB   0 0  0 1  0  0   0 0  0  3   0  0   2   0    0
C    0 0  1 0  0  0   0 0  0  2   0  0   3   0    0
AC   0 1  0 0  0  0   0 0  0  0   2  3   0   0    0
BC   1 0  0 0  0  0   0 2  0  0   0  0   0   0    3
ABC  0 0  0 0  0  0   0 0  2  0   0  0   0   3    0
D    3 0  0 0  0  2   0 0  0  0   0  0   0   0    1
AD   0 0  0 0  0  0   2 0  0  0   0  0   0   1    0
BD   0 0  3 2  0  0   0 0  0  0   0  0   1   0    0
ABD  0 3  0 0  2  0   0 0  0  0   0  1   0   0    0
CD   0 2  0 0  3  0   0 0  0  0   1  0   0   0    0
ACD  0 0  2 3  0  0   0 0  0  1   0  0   0   0    0
BCD  0 0  0 0  0  0   3 0  1  0   0  0   0   0    0
ABCD 2 0  0 0  0  3   0 1  0  0   0  0   0   0    0

Alias structures

The alias matrix can be plotted directly, via ggplot2:

fp_alias_matrix(~A*B*C, ~B*C*D) %>% 
  plot()

Note

The third generator is the dependent one, i.e. the one that has all terms not in common in the first two generators

Daniel’s method

The adas.utils package can also help you with Daniel’s method: the daniel_plot_hn() function takes a linear model object and returns a half-normal plot of the effects:

filtration %>% 
  lm(Y~A*B*C*D, data=.) %>%
  daniel_plot_hn(nlab=6,repel=TRUE) +
  labs(title="Rev. model: Y~A*C+A*D")

Pareto chart of the effects

It’s easy to build a Pareto chart of the effects in a linear model:

filtration %>% 
  lm(Y~A*B*C*D, data=.) %>%
  pareto_chart() +
  theme(
    legend.position = "bottom",
    axis.text.x = 
      element_text(angle=45,hjust=1))

Tukey plots

Tukey’s test TukeyHSD() is not compatible with ggplot2, and its output pretty limited (and not very appealing). The adas.utils package provides a ggTukey() function that can be used to plot the results of Tukey’s test, also with multiple groups:

battery %>%
  ggTukey(Response~Material, 
          splt=~Temperature, 
          conf.level=0.99)

Alternatives?

Only pre-tidyverse packages as:

  • DoE.base
  • FrF2

Future developments

  • adas.utils is currently v1.2.0 on CRAN
    • install.packages("adas.utils")
  • Development version on GitHub v1.2.1
    • devtools::install_github("pbosetti/adas.utils")
  • The package is open to contributions
  • Currently working on:
    • a tool that suggests minimum aberration designs for FFPs \(2^{n-p}\) where \(p\geq 2\)
    • blocking structures

One last thing…

WebR code chunks in Quarto documents

Presentations made in Quarto can be interactive too…

WebR code chunks in Quarto documents

Presentations made in Quarto can be interactive too…

Question time