Yield monitor data is one of the most essential datasets growers use in multiple decision-making processes. This data is in point-based geospatial format, collected by harvesters using yield monitor systems. Yield monitor data can provide information on grain flow, moisture, speed, swath width, time and date, and GPS locations.
While harvesting in the field, yield data points are typically collected every 1 to 2 seconds. The harvester travel speed and data logging rate determine the distance between two yield monitor points (usually 5 to 10 feet). The header width (swath width, ranging from 15 to 40 feet) determines the spacing between two adjacent harvest passes. This yield monitor data is used to create maps that help growers understand crop performance based on the field variability in nutrients, topography, moisture, management, varieties, and pest/disease pressure. Using historical field yield maps, growers can select the appropriate seeding rate, site-specific fertilizer rates, or choose the proper variety/hybrid by identifying yield zones, such as high-, medium-, or low-performing areas within the field (Figure 1).
Figure 1. Spatial distribution of soybean yield across a quarter-section field. The yield variability map was created using yield monitor data, and such maps illustrate the variability of yield, which is influenced by within-field topography (such as summits, backslopes, and foot slopes) and different soil types. Soybean yields ranged from 20 to 60 bushels per acre. Image provided by Deepak Joshi, K-State Extension.
Sources of error in the yield monitor data
Before implementing variability maps, it is crucial to clean them properly to remove any erroneous data that may be present. There are multiple ways in which data recorded by yield monitoring systems might be less reliable or may contain inaccurate data, such as:
Case study to demonstrate the importance of cleaning yield monitoring data
A corn field was harvested in the first week of October in McPherson County, KS, using a combine harvester equipped with a yield monitoring system. The raw dataset contained yield records collected every one to two seconds as the combine moved through the field. Each harvest pass was 30 feet wide, and the distance between one yield point and another was an average of 8.6 feet within each pass (Figure 2a). The raw data included a range of operational and agronomic information such as grain yield, grain flow rate, grain moisture content, combine travel speed, swath width, time and date, and the GPS coordinates of every harvested point. In total, the raw dataset consisted of approximately 13,044 individual data points, with a mean yield of 156 bu/ac across the field. The yield range of the raw data was from 5 to 3,963 bu/ac.
The raw data were then cleaned to remove erroneous points, resulting in a more accurate representation of the true field performance (Figure 2b). Cleaning removed approximately 1,600 erroneous points, reducing the standard deviation from 67 bu/ac to 22 bu/ac (Table 1). Standard deviation is a measure of how different or variable the yield values are. The higher standard deviation in the raw data (67 bu/ac) compared to the cleaned data (22 bu/ac) indicates that many of the extreme values in the raw dataset were not representative of actual field conditions. These outliers created high inconsistency in the yield distribution. After cleaning the dataset, the mean yield changed very little (156 bu/ac in the raw data to 154 bu/ac in the cleaned data), indicating that the true yield values were preserved. However, the standard deviation dropped dramatically in the cleaned dataset, providing a far more accurate and reliable representation of the true spatial yield variability across the field.


Figure 2. Yield monitor data before cleaning (A) and after cleaning (B), highlighting the improvement in data quality for spatial analysis. Images by Deepak Joshi, K-State Extension.
Table 1. A comparison of statistical summaries for raw and cleaned yield monitor data highlights the importance of removing erroneous observations prior to analysis.
|
Statistics |
Total data points |
Mean (bu/ac) |
STD* (bu/ac) |
CV* |
Min (bu/ac) |
Max (bu/ac) |
Range (bu/ac) |
|
Raw data |
13,044 |
156 |
67 |
43 |
5 |
3963 |
5 to 3963 |
|
Cleaned data |
12,434 |
154 |
22 |
15 |
41 |
348 |
41 to 348 |
*STD (standard deviation) and CV (coefficient of variation) are measures of variability.
Take-home message
Overall, yield monitor data is essential in understanding the field's spatial variability. It enables the making of various agricultural decisions, including seeding and fertilizer rates based on within-field variability, as well as many other decisions. However, the real value of such data can be effectively understood through its cleaning and analysis. Raw or uncleaned data may create inaccurate yield maps, leading to poor decisions.
Deepak Joshi, Precision Agriculture Specialist
drjoshi@ksu.edu
Logan Simon, Southwest Area Agronomist
lsimon@ksu.edu
Tina Sullivan, Northeast Area Agronomist
tsullivan@ksu.edu
Lucas Haag, Cropping Systems Agronomist at Tribune
lhaag@ksu.edu