Skip to main content

One post tagged with "zalgoeffect"

View All Tags

· 4 min read
Lucas Weis Polesello

Zalgo Effect is an term used to describe unexpected outcomes of mixing sync and async javascript code.

It means - if you mix these two approaches SOMETHING weird will happen.

It's one of those things you kinda don't understand until you see it in real production systems.

So what it has to do with Resource Leakage?

One day, our SRE team received a couple PagerDuty alerts claiming our services were restarting and not able to work properly due to Error: No channels left to allocate - ie RabbitMQ connections were maxing out in channel allocation. (For RabbitMQ reference into Channels and Connections)

It was clear some code was leaking channel creations. No one knew what could potentially be - but God I had studied this Zalgo Effect in NodeJS Design Patterns Book and it clicked me something.

How was I so sure the Zalgo was the culprit?

The service that was throwing that error was only responsible for fan out a couple messages to a lot of other services - so it was easy as creating a Queue object and running N promises concurrently to publish some message. Checking the RabbitMQ Management UI showed me that we created N channels for that connection.

But why it only happened in some scenarios?

That's where the Zalgo Effect pops in.

Our code was built back in ~2015 - Node 4. The callback style was the mainstream. Our Engineers created the abstraction Queue which dealt with almost 50% of our Event-Driven Architecture by itself and had to make the class style w/ async initializations - not so easily with callbacks.

So the code assumed the following:

  1. Assert exchange, queues and necessary resources - via something we could call consumeChannel.
    1. The consume channel is created whenever the connection is made.
  2. Our confirmChannel - ie the channel we used to publish events was lazily created - mixing async and sync code.

So the problem lives in 2).

Imagine the following:

  • We assertConfirmChannelold-assert-confirmold-get-instance
  • It check's whether the channel EXISTS or NOT.
  • If not, create via PROMISE and return control to EventLoop
  • If does, return it

What happens, if the two concurrent promises reaches the same if without the first promise resolving? We try to create the channel two times and override them - thus keeping channels open but just using only the last one.

This is where the code was leaking channels.

Fixing the problem

Well, the fix we actually shipped was simply calling 1 Promise and await it and then fan out the other promises.

But we made it simple due to risks and since the code is being refactored into a new style.

How can I fix If I see something like that?

If you want a real solution, here's what the V2 would look like - the idea is to create Promises and assign variables with them, instead of doing await on it. Example as below:

assert-zalgo

This easily fixes the problem - by setting a variable as promise and checking its existence.

A more robust style, where you actually need to initialize a couple of resources, you could do something like below

get-or-create-client-print

  1. Create a function to execute the entire Promise.
  2. Set up some reference to it
  3. If requested the same, just use the same Promise.

Ok - but why it fixes the problem?

The idea is to make sure - we are running things in a sync manner and just making the promises settled on their timing. We need to think about the synchronous code execution block to reason about our promises usage.