Alright, we’re still not sure what happened, but everything is up and running now.
The rest of this is a quick technical disclosure:
Years ago I wrote a module for Prosody to to use the Resource Owner Password Credentials OAuth flow to authenticate users against our Jetbrains Hub instance at hub.imfreedom.org.
Everything was working fine until Monday afternoon when I started updating stuff including Jetbrains YouTrack. Last time I tried to update YouTrack it insisted on using it’s internal Hub service and I couldn’t figure out a way to force it to use the external one at hub.imfreedom.org.
In doing so, I had to downgrade and lost issues because I didn’t do a backup right before the upgrade. So this time I did just that! However, YouTrack had other issues where it couldn’t talk to Hub this time. I checked and there were upgrades to Hub so I started that.
I started a backup of Hub, but then got distracted and fired off the upgrade before the backup finished. I didn’t think this would be a problem, and I don’t think it caused the problem, but I’m mentioning it for completeness.
Anyways, the Hub upgrade completed successfully and when I went to check on YouTrack, it was able to talk to Hub and started up just fine.
I then went and had dinner and streamed like normal. After the stream I headed up to bed and noticed that conversations on my phone said I wasn’t connected. I came back to my office to look and saw everyone was getting invalid credentials when attempting to log in to our XMPP server.
When I checked the logs for the XMPP server, they weren’t giving me anything other than “bad credentials”. So I went ahead an wrote a “test script” to help diagnosis this outside for Prosody. That script is below:
AUTH=$(echo "${CLIENT_ID}:${CLIENT_SECRET}" | base64)
curl -v \
--http1.1 \
-H "Authorization: Basic ${AUTH} " \
-d "grant_type=password&username=${USERNAME}&password=${PASSWORD}&scope=${SCOPE}" \
https://hub.imfreedom.org/hub/api/rest/oauth2/token/
Originally I had issues using the -u
parameter to Curl, I don’t recall the exact error message, but figured it might be encoding things weird. The Jetbrains documentation says to use <CLIENT-ID>:<CLIENT-SECRET>
and base64 encode it. So that’s what you see on the first line. Except I started getting an error from hub saying “invalid service secret”.
So I regenerated the secret in hub for the Prosody service, updated my script, same error. I created a new service in Hub, updated the script, and same error.
At this point I realized I needed to reach out Jetbrains support with my findings as it appeared I was stuck. However it wasn’t until Tuesday afternoon that I realized I reached out the the YouTrack specific support that doesn’t cover Hub. So I opened an issue on Jetbrains’ YouTrack with the issue.
There was a lot of back and forth, and probably too much from me, but they were unable to reproduce the issue. I created a new service and gave them the credentials for it for testing. I also provided my curl request. At this point, their support noticed a silly mistake I made.
It turns out that our base64 encoded values where different. This was because I forgot to pass the -n
to echo
when generating the base64 encoded value. I fixed my script and I still got the invalid service secret
error.
I double checked the script and I was still using the old secret. Whoops. I updated the secret to the new one and Hub responded with a token which meant success!!
I think tried to reconnect one of my XMPP accounts and I still got the error. I realized I didn’t update the secret there yet. I realize that if this doesn’t work, anything that’s still connected is going to get booted and not able to get back on, so I go back and verify everything about the testing script and run it again.
Everything checked out. So I updated the secret in Kubernetes and restarted Prosody. I wait for the restart to finish, tell Pidgin to reconnect and like magic everything works again!
We’re still not sure what the exact issue was here. I suspect that something related to the secret might have gotten messed up and by generating a new one that fixed it, but I never noticed it due to the error in my testing script.
At any rate, sorry for the inconvenience to everyone but I hope you enjoyed this breakdown of what happened. Again, special shout outs to the Jetbrains support team who helped get this back up and running!