Help
Ask questions about how to do or fix things in Arrowchat
User avatar
TPonsky
Customer
 
Posts: 12
Joined: 26 May 2020, 08:07

 

by TPonsky 14 Nov 2020, 12:30

Either this master/slave option doesn't work at all or it isn't working the way it's supposed to.

We have a MySQL cluster using RDS on Amazon's AWS. We have one writer endpoint (set as master in the configuration) and one reader endpoint (set as slave in configuration). I also ensured that slave database is set to 1 as is the slave number.

We are also using the push service.

But this last week we did two events with just under 300 users in each event. These events involve a screen with an iframe of the chatroom popout and some video. Users joined into the meeting and the chatroom loaded the user from our database. For the most part people just watched the video and the chatroom was there for asking questions.

For the first event, we had the database servers (writer and reader) as db.t3.medium servers in AWS with 2 CPUs and 4GB of ram. Within a few minutes of the event starting, with maybe only 100 users in the chatroom, the CPU on the writer (master) spiked to 99-100%. It crawled after that. New users coming in had a very hard time getting the chatroom to load. Some users experienced the chatroom going completely blank (just a white background) during the 8 hour long event. The reader database barely touched 10% usage which I can attribute entirely to our laravel application reading from that database.

Through the ENTIRE 8 hour event, there were only 132 chat messages posted to the chat system. Yet, the CPU was spiked and the experience was awful for some users. While the event was going, I observed the server processes and 90% or more of the activity was SELECT statements getting the buddy list or getting the chatroom data. Again, we are using the push service so it's not looking for new messages.

The second day of the event, I upgraded the database servers to 8 CPUs and 32GB of ram. We about the same number of posts to the chatroom (we cleared the previous day messages before the event) and with 8 CPUs in the writer server, the CPU usage grew to about 80%. Again, about the same number of users, most not doing anything in the room but observing it and our videos. While the second day went MUCH smoother in that the chatroom didn't break down, it's absurd to have to do this for every event we do.

CLEARLY the code isn't working correctly. It should be SELECTing from the slave (reader) endpoint as configured. But it refuses to do so. When I was asking for help from the Arrowchat support people about configuration settings and tuning our servers to handle "thousands of users" (as they claim), they had no willingness to even TRY to diagnose the issues we were having and only gave the suggestion of using master/slave. Well I tried that and it's not working.

So I am asking this community, such as it is, if anyone else knows anything about this product and if it can actually do what they CLAIM it can do. What have you done to make the system handle even a few hundred people at a time? Anything? Is it possible or all smoke?