Crop Science,
Journal Year:
2024,
Volume and Issue:
64(6), P. 3293 - 3310
Published: Oct. 15, 2024
Abstract
Breeders
made
remarkable
progress
in
improving
productivity
and
stability
of
cultivars.
Breeding
relies
on
selecting
favorable
alleles
for
performance
to
produce
productive
varieties
across
diverse
environments.
In
this
study,
we
analyzed
the
Genomes
Fields
Initiative
2018–2019
genotype
by
environment
interaction
(G
×
E)
dataset,
focusing
three
populations
double
haploid
(DH)
lines
derived
from
crossing
inbrexpired
Plant
Variety
Protection
(ex‐PVP)
inbred
line
PHW65
with
PHN11,
Mo44,
MoG.
is
an
Iodent/Lancaster‐type
inbred;
PHN11
Iodent
type
ex‐PVP
line;
Mo44
a
tropical‐derived
MoG
agronomically
poor
variety
Mastadon.
Hybrids
were
produced
resulting
DHs
Stiff
Stalk
testers
PHT69
LH195.
The
study's
objective
was
determine
donor
inbreds'
relative
value
understand
impact
selection
history
genomic
prediction.
We
conducted
two‐stage
analysis
compare
hybrid
G
E
variance
populations.
yield
significantly
lower
population
population.
reduced
led
increased
indirect
prediction
accuracy
(when
training
testing
data
are
drawn
same
but
different
environments).
cross‐validation,
had
greatest
45%
time,
followed
(30%)
(25%).
Results
demonstrate
that
greater
longest
(PHN11),
contributing
stability.
Nature Communications,
Journal Year:
2023,
Volume and Issue:
14(1)
Published: Oct. 30, 2023
Abstract
Genotype-by-environment
(G×E)
interactions
can
significantly
affect
crop
performance
and
stability.
Investigating
G×E
requires
extensive
data
sets
with
diverse
cultivars
tested
over
multiple
locations
years.
The
Genomes-to-Fields
(G2F)
Initiative
has
maize
hybrids
in
more
than
130
year-locations
North
America
since
2014.
Here,
we
curate
expand
this
set
by
generating
environmental
covariates
(using
a
model)
for
each
of
the
trials.
resulting
includes
DNA
genotypes
linked
to
70,000
phenotypic
records
grain
yield
flowering
traits
4000
hybrids.
We
show
how
valuable
serve
as
benchmark
agricultural
modeling
prediction,
paving
way
countless
investigations
maize.
use
multivariate
analyses
characterize
set’s
genetic
structure,
study
association
key
factors
traits,
provide
benchmarks
using
genomic
prediction
models.
Theoretical and Applied Genetics,
Journal Year:
2024,
Volume and Issue:
137(8)
Published: July 23, 2024
Incorporating
feature-engineered
environmental
data
into
machine
learning-based
genomic
prediction
models
is
an
efficient
approach
to
indirectly
model
genotype-by-environment
interactions.
Complementing
phenotypic
traits
and
molecular
markers
with
high-dimensional
such
as
climate
soil
information
becoming
a
common
practice
in
breeding
programs.
This
study
explored
new
ways
combine
non-genetic
using
learning.
Using
the
multi-environment
trial
from
Genomes
To
Fields
initiative,
different
predict
maize
grain
yield
were
adjusted
various
inputs:
genetic,
environmental,
or
combination
of
both,
either
additive
(genetic-and-environmental;
G+E)
multiplicative
(genotype-by-environment
interaction;
GEI)
manner.
When
including
data,
mean
accuracy
learning
increased
up
7%
over
well-established
Factor
Analytic
Multiplicative
Mixed
Model
among
three
cross-validation
scenarios
evaluated.
Moreover,
G+E
was
more
advantageous
than
GEI
given
superior,
at
least
comparable,
accuracy,
lower
usage
computational
memory
time,
flexibility
accounting
for
interactions
by
construction.
Our
results
illustrate
provided
ML
framework,
particularly
feature
engineering.
We
show
that
engineering
stage
offers
viable
option
envirotyping
generates
valuable
models.
Furthermore,
we
verified
may
be
considered
tree-based
approaches
without
explicitly
model.
These
findings
support
growing
interest
merging
genotypic
predictive
modeling.
Many
genetic
models
(including
for
epistatic
effects
as
well
genetic-by-environment)
involve
covariance
structures
that
are
Hadamard
products
of
lower
rank
matrices.
Implementing
these
requires
factorizing
large
product
The
available
algorithms
factorization
do
not
scale
big
data,
making
the
use
some
feasible
with
sample
sizes.
Here,
based
on
properties
and
(related)
Kronecker
products,
we
propose
an
algorithm
produces
approximate
decomposition
is
orders
magnitude
faster
than
standard
eigenvalue
decomposition.
In
this
article,
describe
algorithm,
show
how
it
can
be
used
to
factorize
matrices,
present
benchmarks,
illustrate
method
by
presenting
analysis
data
from
northern
testing
locations
G
×
E
project
Genomes
Fields
Initiative
(n
∼
60,000).
We
implemented
proposed
in
open-source
"tensorEVD"
R
package.
Agronomy,
Journal Year:
2024,
Volume and Issue:
14(4), P. 733 - 733
Published: April 2, 2024
Throughout
history,
the
pursuit
of
diagnosing
and
predicting
crop
yields
has
evidenced
genetics,
environment,
management
practices
intertwined
in
achieving
food
security.
However,
sensitivity
phenotypes
genetic
responses
to
climate
still
hampers
identification
underlying
abilities
plants
adapt
change.
We
hypothesize
that
PiAnosi
WagNer
(PAWN)
global
analysis
(GSA)
coupled
with
a
by
environment
(GxE)
model
built
environmental
covariance
markers
structures,
can
evidence
contributions
on
predictability
maize
U.S.
Ontario,
Canada.
The
GSA-GxE
framework
estimates
relative
contribution
variables
improving
yield
predictions.
Using
an
enhanced
version
Genomes
Fields
initiative
database,
shows
spatially
aggregated
is
attributed
solar
radiation,
followed
temperature,
rainfall,
humidity.
In
one-third
individually
assessed
locations,
rainfall
was
primary
responsible
for
predictability.
Also,
consistent
pattern
top
sensitivities
(Relative
Humidity,
Solar
Radiation,
Temperature)
as
main
or
second
most
relevant
drivers
shed
some
light
improvement
response
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Sept. 20, 2024
Abstract
Predicting
phenotypes
from
a
combination
of
genetic
and
environmental
factors
is
grand
challenge
modern
biology.
Slight
improvements
in
this
area
have
the
potential
to
save
lives,
improve
food
fuel
security,
permit
better
care
planet,
create
other
positive
outcomes.
In
2022
2023
first
open-to-the-public
Genomes
Fields
(G2F)
initiative
Genotype
by
Environment
(GxE)
prediction
competition
was
held
using
large
dataset
including
genomic
variation,
phenotype
weather
measurements
field
management
notes,
gathered
project
over
nine
years.
The
attracted
registrants
around
world
with
representation
academic,
government,
industry,
non-profit
institutions
as
well
unaffiliated.
These
participants
came
diverse
disciplines
include
plant
science,
animal
breeding,
statistics,
computational
biology
others.
Some
had
no
formal
genetics
or
plant-related
training,
some
were
just
beginning
their
graduate
education.
teams
applied
varied
methods
strategies,
providing
wealth
modeling
knowledge
based
on
common
dataset.
winner’s
strategy
involved
two
models
combining
machine
learning
traditional
breeding
tools:
one
model
emphasized
environment
features
extracted
Random
Forest,
Ridge
Regression
Least-squares,
focused
genetics.
Other
high-performing
teams’
included
quantitative
genetics,
classical
learning/deep
learning,
mechanistic
models,
ensembles.
used,
such
genetics;
weather;
data,
also
diverse,
demonstrating
that
single
far
superior
all
others
within
context
competition.
Genetics,
Journal Year:
2024,
Volume and Issue:
unknown
Published: Nov. 22, 2024
Abstract
Predicting
phenotypes
from
a
combination
of
genetic
and
environmental
factors
is
grand
challenge
modern
biology.
Slight
improvements
in
this
area
have
the
potential
to
save
lives,
improve
food
fuel
security,
permit
better
care
planet,
create
other
positive
outcomes.
In
2022
2023,
first
open-to-the-public
Genomes
Fields
initiative
Genotype
by
Environment
prediction
competition
was
held
using
large
dataset
including
genomic
variation,
phenotype
weather
measurements,
field
management
notes
gathered
project
over
9
years.
The
attracted
registrants
around
world
with
representation
academic,
government,
industry,
nonprofit
institutions
as
well
unaffiliated.
These
participants
came
diverse
disciplines,
plant
science,
animal
breeding,
statistics,
computational
biology,
others.
Some
had
no
formal
genetics
or
plant-related
training,
some
were
just
beginning
their
graduate
education.
teams
applied
varied
methods
strategies,
providing
wealth
modeling
knowledge
based
on
common
dataset.
winner's
strategy
involved
2
models
combining
machine
learning
traditional
breeding
tools:
1
model
emphasized
environment
features
extracted
random
forest,
ridge
regression,
least
squares,
focused
genetics.
Other
high-performing
teams’
included
quantitative
genetics,
learning/deep
learning,
mechanistic
models,
ensembles.
used,
such
weather,
data,
also
diverse,
demonstrating
that
single
far
superior
all
others
within
context
competition.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Feb. 12, 2024
Abstract
Complementing
phenotypic
traits
and
molecular
markers
with
high-dimensional
data
such
as
climate
soil
information
is
becoming
a
common
practice
in
breeding
programs.
This
study
explored
new
ways
to
integrate
non-genetic
genomic
prediction
models
using
machine
learning
(ML).
Using
the
multi-environment
trial
from
Genomes
To
Fields
initiative,
different
predict
maize
grain
yield
were
adjusted
various
inputs:
genetic,
environmental,
or
combination
of
both,
either
an
additive
(genetic-and-environmental;
G+E)
multiplicative
(genotype-by-environment
interaction;
GEI)
manner.
When
including
environmental
data,
mean
predictive
ability
increased
7-9%
over
well-established
Factor
Analytic
Multiplicative
Mixed
Model
(FA)
among
three
cross-validation
scenarios
evaluated.
Moreover,
G+E
model
was
more
advantageous
than
GEI
given
superior,
at
least
comparable,
ability,
lower
usage
computational
memory
time,
flexibility
accounting
for
interactions
by
construction.
Our
results
illustrate
provided
ML
framework,
particularly
feature
engineering.
We
show
that
featured
engineering
stage
offers
viable
option
envirotyping
generates
valuable
learning-based
models.
Furthermore,
we
verified
genotype-by-environment
may
be
considered
tree-based
approaches
without
explicitly
model.
These
findings
support
growing
interest
merging
genotypic
into
modeling.
Key
message
Incorporating
feature-engineered
efficient
approach
interactions.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: Feb. 15, 2024
In
genetic
association
analysis
of
complex
traits,
detection
interaction
(either
GxG
or
GxE)
can
help
to
elucidate
the
architecture
and
biological
mechanisms
underlying
trait.
Detection
in
a
genome-wide
study
(GWIS)
be
methodologically
challenging
for
various
reasons,
including
high
burden
multiple
comparisons
when
testing
epistasis
between
all
possible
pairs
set
genomewide
variants,
as
well
heteroscedasticity
effects
occurring
presence
GxE
interaction.
this
paper,
we
address
problem
an
even
more
striking
phenomenon
that
call
"feast
famine"
effect
occurs
context.
We
show
any
given
GWIS,
type
1
error
standard
tests
performed
vary
widely
from
nominal
level,
where
actual
GWIS
varies
predictable
function
observed
trait
environmental
values.
Using
methods,
some
GWISs
will
have
systematically
underinflated
p-values
("feast"),
others
overinflated
("famine"),
which
lead
false
interaction,
reduced
power,
inconsistent
results
across
studies,
failure
replicate
true
signal.
This
startling
is
specific
it
may
partly
explain
why
such
has
often
proved
difficult
replicate.
feast
famine
wide
range
but
not
limited
(1)
linear
mixed
model
(LMM)
using
approaches
t-tests/Wald
tests,
likelihood
ratio
score
tests;
(2)
doing
combined
interaction-association
test
LMM
F-tests
(3)
with
environments
SNPs,
these
are
modeled
random
approaches;
(4)
performing
significance
assessed
permutation
residuals.
theoretically
key
cause
variables
conditioned
on
analysis,
suggests
approach
correct
by
changing
way
conditioning
done.
insight,
developed
TINGA
method
adjust
statistics
make
their
closer
uniform
under
null
hypothesis.
simulations
both
controls
improves
power.
allows
covariates
population
structure
through
use
accounts
heteroscedasticity.
apply
flowering
time
Arabidopsis
thaliana.
bioRxiv (Cold Spring Harbor Laboratory),
Journal Year:
2024,
Volume and Issue:
unknown
Published: March 11, 2024
ABSTRACT
Multi-environment
trials
(METs)
are
crucial
for
identifying
varieties
that
perform
well
across
a
target
population
of
environments
(TPE).
However,
METs
typically
too
small
to
sufficiently
represent
all
relevant
environment-types,
and
face
challenges
from
changing
environment-types
due
climate
change.
Statistical
methods
enable
prediction
variety
performance
new
beyond
the
needed.
We
recently
developed
MegaLMM,
statistical
model
can
leverage
hundreds
significantly
improve
genetic
value
accuracy
within
METs.
Here,
we
extend
MegaLMM
genomic
in
by
learning
regressions
latent
factor
loadings
on
Environmental
Covariates
(ECs)
trials.
evaluated
extended
using
maize
Genome-To-Fields
dataset,
consisting
4402
cultivated
195
with
87.1%
phenotypic
values
missing,
demonstrated
its
high
under
various
breeding
scenarios.
Furthermore,
showcased
MegaLMM’s
superiority
over
univariate
GBLUP
predicting
trait
experimental
genotypes
environments.
Finally,
explored
use
higher-dimensional
quantitative
ECs
discussed
when
how
detailed
environmental
data
be
leveraged
propose
applied
plant
diverse
crops
different
fields
genetics
where
large-scale
linear
mixed
models
utilized.