You are generally free to use these datasets in any way you like. Please click on the dataset name to find out more information about it.
All data sets are used in the book "Process Improvement using Data"
Name | Description | Rows | Columns | Tags |
Aeration rate | The total airflow added to an aeration tank, in litres, during a 1 minute period. | 573 | 1 | univariatemonitoring |
Ammonia concentration | The ammonia concentration in a liquid stream, measured every 6 hours, from a waste water treatment unit. | 1440 | 1 | univariate |
Batch yield and purity | The two columns in the data set are: the percentage yield from a batch reactor, and the purity of the feedstock. The feedstock is what we add to the reactor, and the yield is measured after the reaction is completed. The cause-and-effect direction is that the purity of the feedstock has (potential) impact on the yield. | 241 | 2 | multivariateregressionleast-squares |
Batch yields | The historical percentage yields from a batch reactor for 300 sequential batches. | 300 | 1 | univariate |
Bioreactor yields | The percentage yield from a bioreactor given the temperature, impeller speed, duration, and whether or not the reactor has baffles. | 14 | 5 | multivariatecategoricalregression |
Blender efficiency | The effect of 4 factors on blending efficiency. | 18 | 5 | multivariatedoe |
Brittleness index | A plastic product is produced in three parallel reactors (TK104, TK105, or TK107). For each row in the dataset, we have the same batch of raw material that was split, and fed to the 3 reactors. These values are the brittleness index for the product produced in the reactor. | 15 | 3 | multivariatemissing-datapaired |
Certificates of analysis | Four properties of an important powder raw material were transcribed from the supplier's certificates of analysis. | 122 | 5 | multivariatemonitoring |
Cheddar cheese | Concentrations of acetic acid, H2S, and lactic acid in 30 samples of mature cheddar cheese. A subjective taste value is also provided. | 30 | 4 | multivariateregression |
Class grades | Grades from a Chemical Engineering course at McMaster University. | 99 | 6 | multivariatemissing-dataregression |
Distillate flowrate | The flow rate of distillate from the top of a distillation column. | 44640 | 1 | univariate |
Distillation tower | Snapshot measurements on 27 variables from a distillation column; measured over 2.5 years. | 253 | 27 | multivariateoutliersregression |
Electricity usage | Number of kilowatt-hours used in a residential home over a 3.5 month period, 25 November 2011 to 17 March 2012. | 2712 | 5 | univariatetime-series |
Film thickness | The thickness of a plastic film is measured in 4 positions after being cut. The position of the measurements are top right, top left, bottom right and bottom left. | 160 | 4 | multivariate |
Flotation cell | Data from a zinc-lead flotation cell measured on 5 variables; recorded from the PLCs. | 2922 | 5 | multivariatetime-series |
Food consumption | The relative consumption of certain food items in European and Scandinavian countries. The numbers represent the percentage of the population consuming that food type. | 16 | 20 | multivariatemissing-data |
Food texture | Texture measurements of a pastry-type food. | 50 | 5 | multivariate |
Gas furnace | The gas furnace data set from Box and Jenkins' book on Time Series Analysis (series J). Contains the gas rate and the percentage CO2 in the gas. | 296 | 2 | time-series |
Kamyr digester | Pulp quality is measured by the lignin content remaining in the pulp: the Kappa number. This data set is used to understand which variables in the process influence the Kappa number, and if it can be predicted accurately enough for an inferential sensor application. | 301 | 22 | multivariatemissing-datatime-series |
Kappa number | The Kappa number from a pulp mill. | 4462 | 1 | univariatemonitoring |
LDPE | Data from a low-density polyethylene production process. There are 14 process variables and 5 quality variables (last 5 columns). | 54 | 19 | multivariateoutliersmonitoring |
Oil company DOE | Experimental data; testing amount of 4 materials added (A, B, C, D) in order to achieve a certain volumetric heat capacity, y. | 19 | 5 | multivariatecategoricaldoeregression |
Paper basis weight | The dry basis weight of paper is a measure of its density. These measurements are from an online scanning gauge, taken 30 seconds apart at a large paper manufacturer. | 231 | 1 | univariatemonitoringtime-series |
Peas | The taste of 27 pea varieties as measured by judges. After blanching the peas, the peas were quick-frozen, packed into bags and stored for 3 months. | 60 | 17 | multivariate |
Raw material outcome | Six characterizing measurements for batches of plastic pellets; the outcome when using this material, either Poor or Adequate, is also provided. | 24 | 7 | multivariatecategoricalregression |
Raw material height in a container | The height of plastic pellets in a tall narrow container, measured over a period of 3 months. | 84 | 2 | univariatetime-series |
Raw material properties | Measured characteristics of several batches (lots) of plastic pellets. | 36 | 6 | multivariatemissing-data |
Room temperatures | Temperature measurements, in Kelvin, taken from 4 corners of a room. | 144 | 4 | multivariate |
Rubber colour | The colour of a rubber product; this example is to demonstrate how to build a monitoring chart. | 100 | 1 | univariatemonitoring |
Sawdust | Sawdust from birch, pine a spruce were blended in specific ratios. The corresponding NIR spectra are recorded. | 54 | 1204 | multivariate |
Silicon wafer thickness | Thickness of a single wafer, measured at 9 locations for 184 consecutive batches. A single wafer is removed from a tray of wafers (always at the same position for each batch of wafers) after the chemical vapour decomposition process is complete. | 184 | 9 | multivariateoutliers |
Six-point board thickness | Thickness of 2x6 SPF boards from a saw mill. | 5000 | 6 | multivariate |
Solvents | Physical properties of various chemical solvents, such as melting point, boiling point, dipole moment, refractive index, density, solubility. | 103 | 9 | multivariate |
Systematic method | Data are from an open-ended question on a final exam. The values are the grades achieved for the answer to that question, broken down by whether the student used a systematic method, or not. No grades were given for using a systematic method; grades were awarded only on answering the question. | 44 | 2 | multivariate |
Tablet NIR spectral data | Spectra, measured in the transmittance mode, of 460 pharmaceutical tablets; readings are from 600 to 1898 nm in 2 nm increments. | 460 | 650 | multivariate |
Travel times | A driver uses an app to track GPS coordinates as he drives to work and back each day. The app collects the location and elevation data. Data for about 200 trips are summarized in this data set. | 205 | 13 | multivariatemissing-data |
Unlimited time test | The grades from a midterm exam, as well as the time taken by the student to write the exam. It was an "infinite" time midterm, so there was no time pressure to finish within the allocated period. | 80 | 2 | univariateregressionleast-squares |
Unlimited time test 2 | The grades from a midterm exam, as well as the time taken by the student to write the exam. It was an "infinite" time midterm, so there was no time pressure to finish within the allocated period. The test results were from 2013. | 61 | 2 | univariateregressionleast-squares |
Unlimited time test 3 | The grades from a midterm exam, as well as the time taken by the student to write the exam. It was an "infinite" time midterm, so there was no time pressure to finish within the allocated period. The test results were from 2013. | 89 | 2 | univariateregressionleast-squares |
Website traffic | The number of visits to a small website on each day; if a user accesses the site after 30 minutes of inactivity, that will be logged as a new visit. | 214 | 4 | monitoring |
Wine DOE | Data from a fractional factorial for profiling a new wine. The last 5 columns are the taste values from a panel of judges. Higher values are a better overall taste. | 16 | 13 | multivariatedoeregression |
Wood fibres | A sample of aspen tree fibre as characterized by a fibre quality analyzer (FQA). | 25165 | 6 | multivariate |