this post was submitted on 04 Dec 2025
2 points (100.0% liked)

ShareGPT

88 readers
2 users here now

To share LLM text output that others might find interesting.

founded 2 years ago
MODERATORS
 

I can give you a complete analytical procedure and example code, but I cannot directly access or extract posts from your Bluesky feed without the data itself. Below is a precise workflow you can apply once you have the post-level metrics (e.g., timestamps and like counts).


1. Data Requirements

Prepare a dataset containing at least:

  • post_id
  • created_at (timestamp)
  • like_count

Filter the dataset to only include posts from the last 7 days.


2. Compute the Baseline Trend

If you want outliers relative to a trend (rather than a flat mean), you need to model the expected likes per post. Typical approaches:

A. Linear trend: Fit: like_count = β0 + β1 * time_index

B. Rolling mean trend: Compute a rolling average (e.g., 24-hour or N-post window).

C. LOESS smoothing: Provides a smooth non-parametric trend.

For business analytics, the linear model or LOESS approach is most standard.


3. Compute Residuals

Residual = actual_like_count – expected_like_count_from_trend.

Then compute:

  • Mean of residuals (μ)
  • Standard deviation of residuals (σ)

4. Z-Score

z = (residual – μ) / σ

Flag outliers where: z >= +2 This captures posts whose likes are significantly above expectation relative to both trend and overall variance.


5. Example Python (pseudocode with pandas + statsmodels)

import pandas as pd
import numpy as np
import statsmodels.api as sm

# df columns: post_id, created_at, like_count
df['created_at'] = pd.to_datetime(df['created_at'])

# Filter last 7 days
cutoff = df['created_at'].max() - pd.Timedelta(days=7)
df_week = df[df['created_at'] >= cutoff].copy()

# Trend model: linear regression on time index
df_week = df_week.sort_values('created_at')
df_week['t'] = (df_week['created_at'] - df_week['created_at'].min()).dt.total_seconds()

X = sm.add_constant(df_week['t'])
model = sm.OLS(df_week['like_count'], X).fit()
df_week['expected'] = model.predict(X)

# Residuals and z-score
df_week['residual'] = df_week['like_count'] - df_week['expected']
mu = df_week['residual'].mean()
sigma = df_week['residual'].std()

df_week['z'] = (df_week['residual'] - mu) / sigma

# Outliers: z ≥ +2
outliers = df_week[df_week['z'] >= 2]

6. Output

The outliers DataFrame contains the set of posts whose like performance is significantly above trend using the +2σ criterion.

you are viewing a single comment's thread
view the rest of the comments
[–] PumpkinDrama@reddthat.com 1 points 18 hours ago

Here are several open-source GitHub projects that implement time-series or outlier / anomaly detection — you can adapt them to detect “posts with likes >> expected trend” on a feed. I grouped them by suitability for your use (simple time-series, streaming, advanced / ML).


✅ Good GitHub projects for outlier detection in time series / counts (e.g. likes)

Project / Repo Description / Strength
ADTK — Anomaly Detection Toolkit A Python toolkit for unsupervised / rule-based time-series anomaly detection (seasonal, trend, threshold, rolling-/moving-average, etc.). (GitHub)
TODS — Time-series Outlier Detection System A full-stack automated ML system for outlier detection on multivariate (or univariate) time-series: includes preprocessing, feature extraction, detection algorithms, and pipeline automation. (GitHub)
dtaianomaly — Python library for time-series anomaly detection A newer library (2025) offering a broad range of built-in anomaly detectors, preprocessing and visualization tools — useful if you want a flexible, modern API. (arXiv)
chic‑ts‑outlierdetect — Time Series Forecasting for Outlier Detection A smaller repo that helps implement & compare candidate forecasting / anomaly-detection models for univariate time series — useful if you prefer forecasting + residual-based detection rather than simple thresholding. (GitHub)
Outlier‑Detection (AdysTech) — Outlier detection in time series A more classical (R-inspired) approach doing time-series outlier detection; can be simpler to integrate if your use case is basic (e.g. count spikes). (GitHub)

In addition — for a broader survey / catalogue rather than a single tool — awesome‑TS‑anomaly‑detection provides a curated list of many libraries, datasets, and resources; comes in handy if you want to explore multiple methods to find the one that works best. (GitHub)


🔎 Which to pick for “post-likes outlier” detection and why

  • If you want quick, simple detection (e.g. flag posts with likes greatly above rolling/trend average), start with ADTK — its rolling/threshold/seasonal detectors match well to a time-series of “likes per post over time.”
  • If you anticipate more complex patterns (daily cycles, seasonal variation, bursts) or want an automated pipeline, TODS or dtaianomaly give more flexibility and power.
  • If you prefer forecast-based residual analysis (compute expected likes via forecasting, then detect residual spikes), chic-ts-outlierdetect is a good fit.
  • If you want tried-and-true classical statistical methods (less dependency, simpler code), Outlier-Detection (AdysTech) is a minimalist alternative.