The documentation for pyspark
top() function has this warning:
This method should only be used if the resulting array is expected to be small, as all the data is loaded into the driver's memory.
This piqued my interest: why would you need to bring all the data to the driver, if all you need is a few top elements?
The answer is: it does not load all the data into the driver’s memory.
Spark jobs were failing. All of them. The data pipeline had stopped. This is a tale of high-pressure debugging.
Do you sometimes want to access your home computer from an outside network? Maybe you use another system, but you do not trust it and would prefer your home computer for some workflows?
This post outlines the steps to make such access possible.
This post is about a program hang. The hang was in the Python process that was running Ansible scripts. The problem was hard to debug and had me go back to Unix textbook.
Can you see the problem with this code? It comes from Ansible, v22.214.171.124.
It’s quite straightforward. It checks if a directory path exists. If
it does not, then it creates the directory path, similar to
-p. What could be wrong?
A continuous test server I’d set up had stopped working. The XenServer on which it was running had a 1TB disk: and it was full. What’s going on?
This post shows you how to rotate old logs from your application. There is no change to application code. There is no specialized logging library or framework needed. It works for any language, on standard Unix platform.
Can you quickly close a TCP connection in your program by sending a reset (“RST”) packet?
This is a story of a program that worked, until it broke on a 64-bit platform.
It is true. Even on the latest FreeBSD 11.0 (I checked the source tree).
I use Emacs (emerge) while merging files. Today, when trying to merge some Python code, I found that it was taking exceedingly long time. It was blank for 5 minutes and counting.
This is a story of multiple processes running on a system, but with empty PID files.
It took a lot of debugging.
Can you tell me when a shell sends a “HANGUP” signal to a process? What happens if there is a pipeline? What if you prefix this pipeline with the “nohup” command?
If you don’t do log rotation right, you may have a full hard drive and a ghost file.
Have you run into a problem that happens only once in a while, and you never seem to have enough information to figure it out? We all have. This is one such tale: except that it has a happier ending.
As programmers, we face it many times: a build fails on only our system. This is one such tale.
This is a tale about a hopelessly bad error message.