This Stack Exchange answer reminded me of a useful data science trick. In short, if you try to model yy as a sinusoidal function of xx, you obtain a regression formula that is nonlinear in the parameters. However, if you know the period, you can use a trig identity to linearize the formula and compute an exact least-squares fit.

Background

Suppose you are trying to create a regression model for data that looks like this:

Scatter plot of sample input data. The y variable appears to be a sine wave
with a period of roughly 6.28, amplitude a little less than 1, shifted to the
right by about 6 units.

yy is clearly a sinusoidal function of xx (plus noise), so there are three parameters we need to estimate:

  1. The period TT (distance between peaks)
  2. The amplitude AA (height of the peaks)
  3. The phase ϕ\phi (where in the cycle the peaks occur)

For some data, there’s really only one value for the period that makes sense. For example, if xx represents “day of the year” (as in the Stack Exchange question), then we are clearly expecting a yearly cycle (or else we would have collected exact dates). Thus, T=365.25T = 365.25 (or however you want to handle leap years), and we have only to estimate the amplitude and phase that best fit the data.

Problem statement

Without loss of generality, we can assume that T=2πT = 2 \pi (if not, multiply each xix_i by 2π/T2 \pi / T). Then our objective is to minimize the error in the system of equations

yi=Asin(xi+ϕ)y_i = A \sin( x_i + \phi )

for each (xi,yi)(x_i, y_i) in the input data.

Unfortunately, this model is not linear in the parameters to be estimated (AA and ϕ\phi), so we can’t estimate it using ordinary least-squares regression.

The trick

The trick is to estimate the following model instead, which is linear in the parameters β0\beta_0 and β1\beta_1:

yi=β0cosx+β1sinxy_i = \beta_0 \cos x + \beta_1 \sin x

To see that this model is equivalent, apply a trig identity to find that

Asin(xi+ϕ)=Acosxisinϕ+Asinxicosϕ.A \sin( x_i + \phi ) = A \cos x_i \sin \phi + A \sin x_i \cos \phi.

Matching coefficients on sinxi\sin x_i and cosxi\cos x_i, we have

β0=Asinϕβ1=Acosϕ.\begin{aligned} \beta_0 &= A \sin \phi \\ \beta_1 &= A \cos \phi . \end{aligned}

Dividing the first equation by the second, we recover ϕ=atan2(β0,β1)\phi = \operatorname{atan2}(\beta_0, \beta_1). Then A=β0/sinϕ=β1/cosϕA = \beta_0 / \sin \phi = \beta_1 / \cos \phi.

Demo

The following Python script demonstrates our technique.

#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng()

if __name__ == "__main__":
    # Make fake training data
    size = 128  # Number of points
    x = rng.uniform(0, 10, size)
    A = rng.standard_exponential()
    phi = rng.uniform(0, 2 * np.pi)
    sigma = 0.20 * A  # Scale of random noise
    y = A * np.sin(x + phi) + rng.normal(size=size, scale=sigma)

    # First plot showing what the data looks like
    plt.scatter(x, y)
    plt.savefig("cyclical-data-before.svg", transparent=True, bbox_inches="tight")

    # Transform x to its cos and sin, then solve
    #   y = x_transformed @ beta
    # by least squares.
    x_transformed = np.stack([np.cos(x), np.sin(x)]).T
    (beta0, beta1), *_ = np.linalg.lstsq(x_transformed, y)  # Discard other outputs

    # Recover A and phi using the transformation we derived. Estimate A using
    # both expressions and take the average.
    phi_pred = np.atan2(beta0, beta1)
    A_pred = (beta0 / np.sin(phi_pred) + beta1 / np.cos(phi_pred)) / 2

    # Plot the estimated model using regularly spaced values of x
    x_pred = np.linspace(x.min(), x.max(), 500)
    y_pred = A_pred * np.sin(x_pred + phi_pred)
    plt.plot(x_pred, y_pred)
    plt.savefig("cyclical-data-with-fit.svg", transparent=True, bbox_inches="tight")

Here is what the predicted model looks like:

The same scatter plot as above, this time with the estimated sine wave drawn
on top. The estimated function fits the data
well.

Notes

When to retain the linearized parameterization: In the Python code above, we computed ϕ\phi and AA just to demonstrate that the technique actually works. But the code will throw a zero-division error at the computation of A_pred if ϕ\phi is a multiple of π/2\pi / 2 (which happens when one of the βj=0\beta_j = 0). In practice, I recommend discarding the AA and ϕ\phi values and running predictions using the yi=β0cosx+β1sinxy_i = \beta_0 \cos x + \beta_1 \sin x model instead. Experienced data scientists will recognize this transformation, but you could add a comment along the lines of “this is a linear reparamaterization of Asin(x+ϕ)A \sin( x + \phi ).”

Linear combination of sinusoids: You can also use this technique if your yy variable is a linear combination of sine waves of known periods. For example, if xx is “minute of day” and yy is “volume of emails,” you might expect an overall daily trend (period of 24 hours) with smaller variations within each hour (due to scheduled emails going out at the top of each hour). Then you can set wi=2πxi/(2460)w_i = 2 \pi x_i / (24 \cdot 60), zi=2πxi/60z_i = 2 \pi x_i / 60, and minimize the error in

yi=Awsin(wi+ϕw)+Azsin(zi+ϕz)y_i = A_w \sin( w_i + \phi_w ) + A_z \sin( z_i + \phi_z )

using the transformation above.

Fourier transform: If you don’t know the period TT and your xix_i are evenly spaced, then what you’re looking for is probably not a regression model but a discrete Fourier transform.