ShareGPT

88 readers

2 users here now

To share LLM text output that others might find interesting.

founded 2 years ago

MODERATORS

PumpkinDrama@reddthat.com

Identifying Outlier Posts in a Bluesky Feed Using Z-Score and Trend-Adjusted Like Counts (reddthat.com)

submitted 18 hours ago by PumpkinDrama@reddthat.com to c/sharegpt@reddthat.com

1 comments fedilink hide all child comments

I can give you a complete analytical procedure and example code, but I cannot directly access or extract posts from your Bluesky feed without the data itself. Below is a precise workflow you can apply once you have the post-level metrics (e.g., timestamps and like counts).

1. Data Requirements

Prepare a dataset containing at least:

post_id
created_at (timestamp)
like_count

Filter the dataset to only include posts from the last 7 days.

2. Compute the Baseline Trend

If you want outliers relative to a trend (rather than a flat mean), you need to model the expected likes per post. Typical approaches:

A. Linear trend: Fit: like_count = β0 + β1 * time_index

B. Rolling mean trend: Compute a rolling average (e.g., 24-hour or N-post window).

C. LOESS smoothing: Provides a smooth non-parametric trend.

For business analytics, the linear model or LOESS approach is most standard.

3. Compute Residuals

Residual = actual_like_count – expected_like_count_from_trend.

Then compute:

Mean of residuals (μ)
Standard deviation of residuals (σ)

4. Z-Score

z = (residual – μ) / σ

Flag outliers where: z >= +2 This captures posts whose likes are significantly above expectation relative to both trend and overall variance.

5. Example Python (pseudocode with pandas + statsmodels)

import pandas as pd
import numpy as np
import statsmodels.api as sm

# df columns: post_id, created_at, like_count
df['created_at'] = pd.to_datetime(df['created_at'])

# Filter last 7 days
cutoff = df['created_at'].max() - pd.Timedelta(days=7)
df_week = df[df['created_at'] >= cutoff].copy()

# Trend model: linear regression on time index
df_week = df_week.sort_values('created_at')
df_week['t'] = (df_week['created_at'] - df_week['created_at'].min()).dt.total_seconds()

X = sm.add_constant(df_week['t'])
model = sm.OLS(df_week['like_count'], X).fit()
df_week['expected'] = model.predict(X)

# Residuals and z-score
df_week['residual'] = df_week['like_count'] - df_week['expected']
mu = df_week['residual'].mean()
sigma = df_week['residual'].std()

df_week['z'] = (df_week['residual'] - mu) / sigma

# Outliers: z ≥ +2
outliers = df_week[df_week['z'] >= 2]

6. Output

The outliers DataFrame contains the set of posts whose like performance is significantly above trend using the +2σ criterion.

you are viewing a single comment's thread
view the rest of the comments

[–] PumpkinDrama@reddthat.com 1 points 18 hours ago

Here are several open-source GitHub projects that implement time-series or outlier / anomaly detection — you can adapt them to detect “posts with likes >> expected trend” on a feed. I grouped them by suitability for your use (simple time-series, streaming, advanced / ML).

✅ Good GitHub projects for outlier detection in time series / counts (e.g. likes)

Project / Repo	Description / Strength
ADTK — Anomaly Detection Toolkit	A Python toolkit for unsupervised / rule-based time-series anomaly detection (seasonal, trend, threshold, rolling-/moving-average, etc.). (GitHub)
TODS — Time-series Outlier Detection System	A full-stack automated ML system for outlier detection on multivariate (or univariate) time-series: includes preprocessing, feature extraction, detection algorithms, and pipeline automation. (GitHub)
dtaianomaly — Python library for time-series anomaly detection	A newer library (2025) offering a broad range of built-in anomaly detectors, preprocessing and visualization tools — useful if you want a flexible, modern API. (arXiv)
chic‑ts‑outlierdetect — Time Series Forecasting for Outlier Detection	A smaller repo that helps implement & compare candidate forecasting / anomaly-detection models for univariate time series — useful if you prefer forecasting + residual-based detection rather than simple thresholding. (GitHub)
Outlier‑Detection (AdysTech) — Outlier detection in time series	A more classical (R-inspired) approach doing time-series outlier detection; can be simpler to integrate if your use case is basic (e.g. count spikes). (GitHub)

In addition — for a broader survey / catalogue rather than a single tool — awesome‑TS‑anomaly‑detection provides a curated list of many libraries, datasets, and resources; comes in handy if you want to explore multiple methods to find the one that works best. (GitHub)

🔎 Which to pick for “post-likes outlier” detection and why

If you want quick, simple detection (e.g. flag posts with likes greatly above rolling/trend average), start with ADTK — its rolling/threshold/seasonal detectors match well to a time-series of “likes per post over time.”
If you anticipate more complex patterns (daily cycles, seasonal variation, bursts) or want an automated pipeline, TODS or dtaianomaly give more flexibility and power.
If you prefer forecast-based residual analysis (compute expected likes via forecasting, then detect residual spikes), chic-ts-outlierdetect is a good fit.
If you want tried-and-true classical statistical methods (less dependency, simpler code), Outlier-Detection (AdysTech) is a minimalist alternative.