Show HN: I made Google Trends for Hacker News by indexing 18 years of comments

(hackernewstrends.com)

315 points | by ytkimirti 2 hours ago

45 comments

zX41ZdbW 12 minutes ago
I host a publicly open database with Hacker News data at https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...
So you can create any sort of similar services in a single SQL query and an HTML page.
I also hosted it as a publicly accessible data lake, which you can query from everywhere: https://github.com/ClickHouse/ClickHouse/issues/29693#issuec...
It is also updated in real-time.
Aachen 17 minutes ago
Google Trends is about searches
This is about text submissions. More like if Google Trends counted word occurrences on webpages. Or if Google Ngrams counted webpages instead of books
People don't write much about non-newsworthy things whereas many people search "burger" anytime they want a burger delivery. The datasets aren't usable in the same way
[-]
- morkalork 5 minutes ago
  Now if Algolia had a dataset of what people are searching for on HN that'd be it
  [-]
  - Aachen 0 minutes ago
    Was considering that as well, but I doubt that people use Algolia in the same way that they use Google
kaelyx 59 minutes ago
Hello, /api/hn -> 502 {"error":"Your database has been temporarily rate-limited, please contact support@upstash.com for further details."}
simonpure 1 hour ago
Hug of death
` /api/hn -> 504 An error occurred with your deployment FUNCTION_INVOCATION_TIMEOUT cle1::c8vgv-1782399959042-aeba3cae05ff `
[-]
- docheinestages 1 hour ago
  If this project is an ad for their product (Upstash, promising "Highly Available, Infinitely Scalable"), then the last thing they'd want is a hug of death :/
  [-]
  - ryan_n 1 hour ago
    Oof that would be hilarious/tragic
  - steve1977 1 hour ago
    Downstash
    [-]
    - y1n0 55 minutes ago
      Must stash
- superxpro12 1 hour ago
  /api/hn -> 502 {"error":"Your database has been temporarily rate-limited, please contact support@upstash.com for further details."}
  [-]
  - esafak 1 hour ago
    A cache would help.
- Roonerelli 1 hour ago
  I get
  /api/hn -> 502 {"error":"Search entry should have an initialized schema, command was: [\"SEARCH.AGGREGATE\",\"hn\",\"{\\\"$or\\\":[{\\\"title\\\":{\\\"$eq\\\":\\\"anthropic\\\",\\\"$boost\\\":5}},{\\\"text\\\":{\\\"$eq\\\":\\\"anthropic\\\"}}]}\",\"{\\\"by_month\\\":{\\\"$dateHistogram\\\":{\\\"field\\\":\\\"time\\\",\\\"fixedInterval\\\":\\\"30d\\\"}},\\\"top_authors\\\":{\\\"$terms\\\":{\\\"field\\\":\\\"by\\\",\\\"size\\\":6}},\\\"by_type\\\":{\\\"$terms\\\":{\\\"field\\\":\\\"type\\\",\\\"size\\\":4}}}\"]"}
- jjordan 1 hour ago
  back in my day we called this a good ole' fashioned slashdotting.
  [-]
  - lysace 1 hour ago
    Our startup (~20 people) got slashdotted in 1998 or so. I was the only one randomly awake at the time. Remember watching all the logs from our web server in realtime, ready to immediately kill anything or anyone threatening the overall availability.
    512 kbps uplink, I think. Even accidental DoS was trivial. We had a self-hosted little data center at our office with the only available stupidly expensive commercial connection.
    Felt some dread having to restart the main (async, single-process) web server a few times to keep things going due to bugs in our code. So many* people on dial-up patiently waiting for the page to load.
    It was exhilarating though :).
    *) Surely at least a hundred!
    [-]
    - mysterydip 29 minutes ago
      One of the things I love about HN is having stories like this in the comments from otherwise random unassuming usernames
  - Onavo 1 hour ago
    Its funny that these days the bottleneck is usually the data layer. Servers are so powerful now that even your average $5 server can handle HN levels of load if configured correctly.
- ytkimirti 1 hour ago
  We will be with you shortly :)
- aNapierkowski 1 hour ago
  yeah we killed it :(
  [-]
kpw94 25 minutes ago
The huge spike of "lk-99" in science & frontier tech is amusing...
This is cool concept, would love a positive/negative sentiment computed for each comment that refers to a given word, so you can see trends of "cloudflare (positive)" vs "cloudflare (negative)" where first one counts comments only if sentiment confidence is greater than say 0.6 and the other one counts comments only if sentiment is less than 0.4 (assuming [0,1] sentiment score)
dwoosley 8 minutes ago
Almost all of the major vulnerability and hack are just single spikes at the time it happened and it tails off after that… except Stuxnet. Stuxnet is was much more interesting that most other attacks since it was very political and openly published. Of course, the thing that attack was about is still a news headline today as well
[-]
- john_strinlai 2 minutes ago
  >Stuxnet is was much more interesting that most other attacks since it was very political and openly published
  it was also super cool, from a technical perspective. and was one of the first (to my knowledge) to really bring the concept of large-scale attacks against industrial control systems to light.
stopachka 5 minutes ago
Nice! Would love a brief explanation of the infrastructure. I see the Powered by "Upstash Redish Search", but why choose Upstash Redis Search vs something else?
jtolmar 11 minutes ago
It looks like some of these terms aren't indexed (or the site is just too hug of deathed right now), but I'd like to see the graph of like, social media, iot, cryptocurrency, ai.
The transition between crypto and ai on the graphs is already pretty funny. https://hackernewstrends.com/?q=crypto&q=chatgpt
arjie 1 hour ago
One useful feature would be to normalize by total so that I can see changes in something as opposed to just total site growth. Right now I have to chart a single generic parameter but if I pick poorly it’ll confuse the issue.
smalltorch 1 hour ago
Reminds me of this side project I'm working on.
https://gitlab/here_forawhile/torum
It's a HN clone, that syncs with HN that allows you to basically establish smaller private communities who can discuss anything that's on HN without actually being on HN.
It also indexes and let's you search through the DB which I find is really useful to find things that peak my interest.
[-]
- hk__2 1 hour ago
  Fixed link: https://gitlab.com/here_forawhile/torum
- all2 20 minutes ago
  *pique
  'peak' refers to the top of a thing, commonly mountains
ltrg 7 minutes ago
It would be super interesting to see if HN mentions serve as a leading indicator of company performance/valuations -- I wouldn't be surprised.
bluecoconut 38 minutes ago
Very cool!
one subtle consistency bug that made it hard for me to interpret when I was clicking around: the small thumbnail plot vs the full plot often (always?) seem to use different colors.
The blue / orange gets assigned to the opposite labels in the A vs. B when you click, which made it confusing to understand.
linzhangrun 14 minutes ago
Great job! I've also been wanting to do similar statistics recently, wanting to know when LLMs becoming the absolute dominant topic on HN. Now it seems like half of the posts were about LLMs.
Cider9986 9 minutes ago
Scrolling is totally broken for me.
sinuhe69 1 hour ago
IMO, using AI to assign keywords to a broader group of strict synonymous keywords would make the comparison much more helpful.
Because in general we want to know the trend of categories more than of a word, asking for “auto pilot” for ex. should include “self driving”, FSD etc.
[-]
- marky1991 1 hour ago
  I would not like this. This is the kind of change that made google search so annoying. (Eg what if I want to track the history of 'self-driving' vs 'auto pilot' in sales pitches? Or more basically, what if the system wrongly interprets me wrongly?) Better to support | or similar old-fashioned search engine syntax and dwis and not dwim.
  [-]
  - Pikamander2 57 minutes ago
    Synonym functionality is good as long as there's an easy way to disable it, either globally or by wrapping the term in quotes.
dom96 1 hour ago
Very cool idea. Shows programming language trends pretty well.
https://hackernewstrends.com/?q=Nim&q=Rust&q=Zig
ytkimirti 2 hours ago
Hello HN,
This was a small project of mine after I've found out that I can simply the whole hackernews archive (~48GB) and play around with it.
You can compare terms just like in google trends and you can also see the exact posts & comments from that time.
I like that you can discover what went crazy in the timeline, they just come up as small burst of activity, it's quite fun to play around with it. https://hackernewstrends.com/?q=litecoin&q=dogecoin&q=solana...
I also have a seperate page for the "Who is Hiring?" posts, here is the distribution of programming languages over each monthly "Who is hiring?" post in HN ever. https://hackernewstrends.com/who-is-hiring
Any kind of feedback is welcome.
[-]
- jupr 1 hour ago
  Honestly the HN archive is very valuable. If you had it all on a local db with everything indexed you basically end up with a offline search engine.
  Where is this archive located you speak of?
  [-]
  - fragmede 1 hour ago
    It's on firebase, per https://github.com/hackernews/api
- cbeach 1 hour ago
  This is excellent.
  A minor suggestion - I'd like to be able to render the current graph taller (full height of my browser window).
  Also some sentiment analysis on the "people" graphs would be very insightful (particularly for the likes of Edward Snowdon, Julian Assange, Elon Musk and Sam Altman). Perhaps colour the area under the graph red-orange-green based on the sentiment?
  [-]
  - ytkimirti 34 minutes ago
    Thanks for the feedback, noted the full-screen request.
    The sentiment analysis is very interesting, I can do that easily. Could be a new page as well. Did you see this anywhere else or just your idea?
    [-]
    - cbeach 29 minutes ago
      Just my idea. I'm working on a side project https://newsavista.com/invite/ASAD68923E that aggregates news and tracks news trends and changing sentiment on the major stories. With cheap cloud LLMs (and "free" local LLMs) it turns out to be a trivial feature to build.
corv 53 minutes ago
The 'flash vs html5' chart looks strange juxtaposed with that conclusion
[-]
- al_borland 22 minutes ago
  There are a few technologies with pretty generic names which don’t lend themselves so well to this kind of trend analysis.
  I was curious about Atom. According to the trend it’s still neck and neck with VS Code. But are people really talking about Atom the text editor that much still, or other types of atoms?
  [-]
  - fg137 11 minutes ago
    I think Google Trends is actually smart enough to suggest which topic you want to see for the same keywords -- it understands the semantics.
SoKamil 25 minutes ago
Are those raw numbers or adjusted for active users at given point in time?
cloudkj 1 hour ago
This is great, I was just hoping to find a tool like this and specifically scoped to "Show HN" posts? Is there a way to do that?
[-]
- ytkimirti 1 hour ago
  Great idea actually, I'll add that as well for sure
thomasgeelens 7 minutes ago
oeeh hug of death, congrats!
ytkimirti 1 hour ago
We had to take the site down for a second, it'll be online in a few minutes. Thanks for trying it out
scarecrw 1 hour ago
Very cool!
I'd love to have some sort of normalization option to separate more subtle positive trends from the general increase in number of posts.
igcorreia 38 minutes ago
The colors of the lines of the big graph are inverted compared to the smaller ones.
NoSalt 1 hour ago
Woah, great work!
I am really liking the trend for "linux": https://hackernewstrends.com/?q=linux
[-]
- dgellow 1 hour ago
  Funny how closely that tracks with windows
  https://hackernewstrends.com/?q=linux&q=windows
  [-]
  - addandsubtract 1 hour ago
    Does the trend only show absolute numbers? Because I think it should be divided by the number of posts during the time frame (day?).
jahala 1 hour ago
Really cool! Where would you get the data for something like this? Is it open, or its scraped?
NooneAtAll3 55 minutes ago
I'd be interested in "google ngram for hacker news" instead
[-]
- ytkimirti 36 minutes ago
  What is missing from it? I've used ngrams as well and I this was partly inspired by that.
rightbyte 1 hour ago
Nice. Is the data points y-axis normalized by total amount of comments at that time?
Edit: Nvm seems like absolute count if you click the graph.
flakiness 1 hour ago
The example comparisons made me smile. Well done!
chris_money202 1 hour ago
Love this, seems to struggle with newly indexed words. Will try again when the FP load is gone
mkgeorge7 32 minutes ago
This is actually very cool@
mkgeorge7 33 minutes ago
This is actually very cool!
k33n 18 minutes ago
This is quite useful at-a-glance
joelres 1 hour ago
Really beautiful, informative, and functional layout. Great work!
docheinestages 1 hour ago
But can it discover new trends without having to type the keywords?
GL26 1 hour ago
insane ! I don't know if it's possible but it would be huge if we had access to the localisation of the trends
drchaim 1 hour ago
too slow or broker right now
lazystar 1 hour ago
nice. i guess AWS still had nothing to fear from GCP/Azure. ty for this
ProofHouse 53 minutes ago
Yup your upstash is rate limited
jdw64 1 hour ago
COOOOOOOOOOL!!!!!!
vachina 1 hour ago
This is the only HN submission I ever upvoted because it is amazing
[-]
- ytkimirti 1 hour ago
  Thanks, it was my first ever post here as well, would you look at that
- fragmede 1 hour ago
  If more people spent time on /new looking for awesome stuff and vouching for dead items, HN would be a better place.
- frankzero 1 hour ago
  I know right
clacker-o-matic 1 hour ago
ooh this is sick! really nice ui too!
some_furry 1 hour ago
https://hackernewstrends.com/?q=furries&q=furry
Hmm, did I break something?
oystersauce8 1 hour ago
love it
JFGAi 1 hour ago
[dead]