Intermittent Learning Log
A running record of what I learned, built, or encountered for the first time.
October 30, 2025
Work of the day: Today I dove deep into how venv and pyenv redirect Python calls via the PATH and shims.
Takeaway: Helped me see clearly how developers use environment variables and shims to stitch programs together and build flexible, robust environments.
See the blog post for more.
October 29, 2025
Since I've been away: I've been buried deep in a research project the last couple weeks, which has helped me clarify my role as a researcher, how programming and data engineering fit into that work, and the tools in a researcher's toolbox.
Takeaway: Clarified my mission statement — "turning opacity into clarity."
See the blog post for more.
October 10, 2025
Work of the day: Locked down my Google service account secrets file.
See the blog post for the gory details.
October 9, 2025
This entry was later expanded into a blog post: Seeing beneath the interface: The layers of Google Sheets.
Work of the day: Continuing work on the Google Access application, I dove into the architecture of google.oauth2 and googleapiclient packages.
What I saw: Studying these packages helped me see the deeper structure of Google Sheets, and, by extension, most modern digital applications.
- Data layer: The foundation where the raw data lives. For Google Sheets, it might resemble an internal object with key-value pairs, such as
{"A1": "Name", "B1": "Phone"...}. This layer is the steady contract that the rest of the layers depend on. - Logic/Behavior layer: Defines the rules and operations that act on the data. In Sheets, they includes formulas, named ranges, and recalculations behavior and is written largely in C++, Java, or Go.
- Presentation/GUI layer: Handles how users interact with the data via the GUI. Typing, selecting cells, using menus, etc. Built primarily with JavaScript/TypeScript, HTML and CSS.
- API/Integration layer: Provides controlled access via APIs so other systems and developers can interact with the data.
- Security layer: A meta layer that spans all others, responsible for authentication, permissions, quotas, and overall system integrity.
Understanding the bone and muscle beneath the surface of Google Sheets lets me see past the blinking lights of the GUI and API to the true payload: the data itself. This perspective also clarified my place in the system: building operational, reliable pipelines and integrations that transform data into actionable information.
October 8, 2025
Work of the day: Continuing work on the Google Access application, I refined my project folder structure and wrote a function to download Google Sheets data as a csv.
Takeaway: I now understand how __init__.py controls namespaces for my package, and when to promote functions to the top level. Rule of thumb, only promote functions when they are used frequently across scripts. Conceptually, it's simply an address. For certain addresses that receive a lot of mail, such as the White House, it makes sense for the USPS to deliver mail simply addressed to The White House, Washington, D.C. But for mail sent to an ordinary residential home, we have to write the street name and number.
October 7, 2025
Work of the day: Turning back to my google-access application, I created CODEMAP.md for quick reference to all the functions I have in the directory.
src/ google-access/ ingest/ ingest.py [] download_item – download item io/ auth.py – get into google drive [x] authenticate – authenticate google service account with SCOPES [x] build_service – build service drive.py – navigate in drive [x] get_folder_items – returns a list of dicts of folder_item: metadata transform/ utils/ constants.py [x]SCOPES : list [x] MIME_TYPES : dict log_utils.py [x] report – report with message and level. optionally print [x] logging_config – configure global logging for app text_utils.py [x] remove_diactritics [x] cannon – canonicalize a filename or text string for matching projects\ cacciamani-request\ src\ file_transforms.py [x] combine_pdfs – merge pdfs [x] remove_pages – remove pages from pdf
October 6, 2025
Work of the day: Git Day! I dug deep into how to safely push branches to GitHub in both individual and collaborative projects.
Added superpowers: I can now use Git to restore changes, navigate and investigate commits, and diagnose and fix branch divergences.
Key things I learned:
- Use
git reset --mixedwhen local main is ahead of remote main to undo commits while keeping changes, thengit stash, and finallygit reset --hard origin/mainto realign local main with remote. - Use
git pull --ff-onlywhen local main is behind remote main. - Use
git rebase origin/mainto replay commits from the current branch on top of the latest commit from origin/main—critical in collaborative work to avoid overwriting others’ changes. - Use
git log --graph --decorate --oneline --allto visualize all local and remote branches and their commits. - Use
git logto review branch history,git show <commit>to inspect the details of a specific commit, andgit diffto investigate changes in working tree.
Extra credit: I now know how to check which protections origin/main has, if any.
October 3, 2025
Work of the Day: I mostly worked on this blog today. I wrote the October 2 entry below processing lessons learned from my first real data engineering project, turned it into a blog post, and updated my Scorecard. Hey, it takes work to look this pretty!
October 2, 2025
This entry was later expanded into a blog post: Data engineering is all about the feels.
At the end of the day, code and data tools are just the foundation of data engineering. What matters most is how people feel about us and the interfaces we build. Do they feel they can trust us to build systems that safeguard their data? Do they feel comfortable accessing and inputting information? Do they feel like they are listened to? Therein lies the real work of data engineering.
TL;DR: This week: trust, soft skills, and a hard lesson in data loss.
What I'm working on: I’m building a single source of truth for all our member information. Right now, that data is scattered across Google Sheets, digital folders, and even physical files. The project is about more than consolidation; it’s about building reliability and confidence in how we access and use that information.
Tools I'm using:
- Python:
- Data handling:
pandas,numpy,re,unicodedata - Storage:
sqlite3 - Utilities:
pathlib,typing,logging - Documents:
PyPDF2 - Google API access:
gspread,google-auth,googleapiclient
- Data handling:
- Platforms: VS Code, Google Sheets, Google Drive, Google Cloud Console.
What I've learned:
- The power of soft skills. Building a successful data pipeline involves way more than solid technical skills. Listening, communicating clearly, solving problems with flexibility and patience, and managing expectations are just as critical as writing code. My clients are my teammates. If they don’t trust the security of the data or feel comfortable entering and retrieving it, no technical wizardry will matter.
- Back up. And never forget that trust is the real currency. We had two data loss incidents. In one case, I managed to recover everything after hours of work. In the other, part of the dataset was gone for good. Both shook confidence in the system. What saved me was the credibility I had built over years of showing up and delivering. In a way, I spent that “stored trust” like currency.
The clearest lesson of all: more so than any language, platform or technique, data engineers depend on the trust our clients have in us. Files can often be restored, but trust is much harder to rebuild.
September 28, 2025
Work of the Day: I’m building tools that:
- Inventories PDFs and JPGs stored in the campaign’s shared Google Drive (which I set up and administer).
- Converts JPGs to PDF.
- Combines and separates PDFs based on specifications.
One thing that broke: I built functions for transforming PDFs in
src/utils/transforms/file_transforms.py and created a pyproject.toml file, but was unable to install the package in editable mode. After much pushing and pulling, I discovered the failure was caused by renaming the project’s root folder after creating the virtual environment. On Windows, the file path at the time of creation is hardcoded into pip.exe, so pip was searching for a path that no longer existed.
How I fixed it: A workaround is to run pip via Python (python -m pip install ...) instead of calling pip directly. I opted for a cleaner, permanent solution. I removed and rebuilt the virtual environment to reset the hardcoded paths.
Takeaway: Virtual environments in Windows hardcode their creation path into executables like pip.exe. If the project root is renamed or moved, those shims break. To avoid surprises, I’ll either recreate the venv after structural changes or consistently use python -m pip to bypass the shims.
Side Effect: I took this opportunity to deepen my understanding of pyenv. pyenv allows for easy standardization of Python version across users. By running pyenv local <version>, I create a .python-version file in the project root directory. That file can be checked into version control, so anyone who clones the repo knows exactly which interreter version to use.
September 25, 2025
Since I've been away: We got a large, urgent request for workers at my job. I needed to quickly build a system to intake hundreds of form responses, assign those responses to our agents, and capture data from the agents' conversations with the respondents. Since time was of the essence, I built a less than perfect application using our agents' preferred tool, Google Sheets. Then I went to work both putting out fires and making the system both more robust and automated. It felt like I was trying to repair and improve a train while that train was running full speed. There were errors both small and large, and learned a lot.
Next step is to build a program that authenticates and connects to my Google Drive so I can automate and harden the system.
At the same time, my wife and I prepared our home to bring my mom to live with us. We bought one of those big plastic sheds, paid my step-son and his wife to set it up, cleared out the spare bedroom, ordered a hospital bed and bought all the supplies needed for my mom. We were all set to bring her home this past Saturday. Then she passed away on Thursday. I'll write more about this. I really wanted her to be with us. And also, I'm happy for her. She was ready to go. She had done all she needed to in this life. I'll share more in a post later. I miss you, mom. Thank you for everything. Beings die so that other beings have a chance to live.
September 8-9, 2025
Work of the day(s): Inspired by Matt Godbolt's lecture on machine code, I built a CPU emulator in Python with its own pseudo-assembly language.
New skill of the day(s): I dove deep into tuple unpacking with *args.
Skills practiced/developed:
- Designed the execution logic by identifying potential errors for each operation, grouping them, and writing helper functions to validate input or raise errors.
- Built comprehensive data validation and error handling.
- Documented the program thoroughly.
Next tasks:
- Set up the project structure including
pyproject.toml. - Write two different programs in the "toy" language that build fibonacci series.
- Build tests to run with
pytest. - Expand the toy language with flags, comparisons and conditional branching and additional capabilities.
September 5, 2025
Work of the day: Set up ruff configs in pyproject.toml and ran ruff in my eli-tools project.
New skill of the day: Learned to configure and run ruff as a linter and style checker.
What broke and how I fixed it: The Even Better TOML extension was crashing repeatedly. I went into VS Code settings and disabled schemas for Even Better TOML.
Aha moment of the day 1: pyproject.toml is a central config hub for my project. It stores settings for internal tools such as pytest and ruff. External tools such as pip use it to read metadata and configuration about my project.
Aha moment of the day 2: VS Code settings are a series of JSONs. I can configure VS Code on global, user and workspace levels.
September 4, 2025
Work of the day: Started to develop work flow for publishing changes to my editable package eli-tools.
New skill of the day: Learned to build and implement tests using pytest.
September 2, 2025
Work of the day: Built, pip installed, and imported my first Python package as an editable install! The package includes a function that uses Path.glob() and returns a single path or raises an error if Path.glob() matches zero or multiple paths.
Insight of the day: In building my own packages, I need to adopt a clear versioning and release plan to minimize risks for anyone depending on my code, including myself.
Tangent of the day: I see how someone could infect a computer by slipping a malicious .pth file into site-packages. Because Python executes code inside .pth files at startup when the any line starts with import (see site documentation), the attacker could make it load a doctored version of, say, pandas whenever the user tries to import it.
August 27-29, 2025
Work of the day(s): Completed the project to track and report on reconstruction projects in the towns of Yabucoa and Humacao.
Takeaway of the day(s): I gained clarity on Python's function parameters.
Most important thing to work on: Build fluency with vectorization in pandas.
Feeling of the day(s): I am seeing the the logic and patterns behind pandas, whereas before it felt more like I was stumbling around in the dark.
Next project: Start to build up my tools.
Concrete learnings:
- Python
- I'm now comfortable writing clean and clear function parameters.
- Wrote to an excel file from Python for the first time.
- Learned how to create and use masks in pandas to filter results.
- Gained fluency with navigating through and getting information from groupby objects.
- Use
path.glob(ptrn)to return an iterator with zero or more path objects. Include extension inptrnand usepath.is_file()to filter out unwanted matches. Userglob()to search recursively. - Convert the iterator of path objects into a
list,tuple, orsetwhen I need all results, or usenext()or tuple unpacking when I just need a specific result. - Using
raise Errorstatements stop execution when conditions aren't met and provides clear, intentional error messages. - Using list comprehension with two
forloops allows me to compress several lines of code into one.Formula:
[expression_using_outer_and_inner_item for outer_item in outer_iterable for inner_item in inner_iterable]Example:
[p for ptrn in patterns for p in dir_path.glob(pat)] - Unpacking tuples provides precision and flexibility when working with iterables.
(x,) = iterableasserts exactly one item and raises if the iterable contains 0 or multiple items. - Creating variable names dynamically may look like a shortcut, but it introduces major risks, including code injection, brittle downstream logic, unreadable code, and higher chances of bugs and naming collitions.
- Dividing a program into single-use functions.
- Making my code more transparent and making errors and unwanted results more diagnosable by using error handling, logging, and printing during runtime.
- Logging and error handling.
- Defining function boundaries and planning the flow.
- Designing the Preview → Confirm → Apply pattern.
robocopyflags and behavior.- Argument arrays vs. strings in PowerShell, specifically with
robocopy. - Switch defaults.
- Clear function types:
- Core helpers: Pure logic. No I/O.
- Adapters: Thin wrappers around tools.
- Orchestrators: Call adapters in order, handle user interaction, collect results.
- Main: Top-level driver that stitches steps together.
- Wrote a simple “function contract” (inputs, preconditions, side effects, outputs, errors).
- Prompting the user for input and incorporating it into the program.
- Reading, writing, and modifying text files.
- Working with arrays and strings.
- Constructing argument arrays cleanly.
- Building path-segment regexes for exclude lists.
- Using
robocopypreview mode (/L) before the real copy; excluding dirs/files with/XDand/XF. - Developed python project directory structure incl:
- Project startup checklist
pyproject.toml- Git workflow and common command cheatsheet
- CHANGELOG.md
- README.md
set(my_dict)returns a set consisting of the dictionary's keys (same goes forlist(my_dict)andtuple(my_dict)).- Mutating vs non-mutating iterable methods: As a general rule, use mutating methods (eg.
set.update(),list.append()) when I want to alter an iterable in place. Use non-mutating methods (eg.set.union(),new_list = list1 + list2) when I want to create a new iterable without alterning the original. Be intentional! Performance and behavior differ: mutation generally has less overhead and keeps existing references intact, while non-mutating methods allocate new objects. - Use the unpack operator
*to efficiently iterate through an iterable. - Use
for _ in range(n)to indicate that I will not be using the placeholder. - Two ways to create a dictionary from two lists:
dict(zip(list1, list2))is the most straightwor forward method.- Use
{x : y for x, y in zip (list1, list2)}when I need to transform the values (eg.{... :y/100 for x, y...}), filter data (eg.{... in zip(list1, list2) if y > 80}), or change alter the keys (eg.{x.upper():...}).
set.union()is an efficient and simple method for combining two or more sets.- Use the
timeitmodule to effectively measure performance of small snippets. - Use
for _ in range(n), as opposed tofor i in range(n)when I want to communicate that I will not be using the placeholder. - The here-string syntax marker
@' ... '@allows me to define and store multiline strings as variables to use later in Python commands. - The
-m <module>option in the Python executable provides access to module's functions, classes and methods, similar to importing it into a.pyfile. pyandpythonare different commands.pythontells PowerShell to run whatever file namespythonappears first in my PATH. In my case, this is a pyenv-win shim (python.bat) that wraps and launches the actualpython.exe.py, meanwhile tells PowerShell to run the Python Launcher for Windows (py.exe), which directly startspython.exewithout going through a batch wrapper. I had a case where running Python through the batch wrapper was creating an unexpected effect with a multiline here-string. When I switched frompythontopy, the command worked as expected.
August 26, 2025
Work of the day: Started a project to track and report on reconstruction projects in the towns of Yabucoa and Humacao.
Takeaway of the day: path.glob(ptrn) returns an iterator of path objects that match the given pattern.
Most important thing to work on: Build fluency with pathlib to streamline file and directory access - job #1 for almost all data projects.
Narrative of the day: I struggled today with error handling, logging, and working with the iterators returned by path.glob(). Even though I didn't get as far as I would have liked with my project, I have developed a much clearer mental picture of these three areas of writing clean Python code.
Mistake of the day: Creating dynamic variable names.
Concrete learnings:
- Python
Practiced:
- General programming
August 25, 2025
Work of the day: Building a PowerShell program to bootstrap a new data project by copying a project scaffold/template.
Takeaway of the day: Functions should be single-purpose! One function = one job.
Most important thing to work on: Program architecture — divide work into single-purpose functions, categorize them by use case (helper, adapter, orchestrator, main), and knit them together cleanly.
Struggled with:
Progress made:
- General programming
- PowerShell
August 17-18, 2025
Data Science
Now I'm ready to build my first robust data science project.
August 15, 2025
Takeaway of the day: Learned how to use timeit to test performance.
Python
PowerShell
August 13, 2025
Takeaway of the day: Before, I had used a "good-enough" methodology to writing Python. I saw today that will be able to write more efficient code faster by learning to write pythonically.
Code of the day:
set_of_keys = set().union(*list_of_dicts)
Python
- I used to combine lists using
list3 = list1 + list2because it was easier to understand thanlist1.extend(list2). After learning thatlist.extend()is faster, uses less memory, and maintains references, I committed to using it as my default list builder, and worked with it until I really understood it. - Use
(*my_list)to unpack an iterable.
August 12, 2025
Takeaway of the day: I now feel comfortable using Chrome's devtools Network panel, and know where to look for network requests.
Web scraping
- Explored Chrome's DevTools Network panel, located my network requests and got responses from API endpoints in PowerShell and Python.
PowerShell
- PowerShell's Execution Policy help protect us from running malicious scripts. As a developer, I will mostly set my policy as RemoteSigned to make sure all downloaded .ps1 scripts are from trusted sources while allowing me to run my own scripts.
August 11, 2025
Takeaway of the day: RESTful APIs — I finally understand you.
Reflection of the day: Now that I have a solid grasp of how to use Git to collaborate on projects, I can start contributing to open source work to hone my skills and gain real-world experience.
Data Science
- After exploring OpenFEMA, I finally understand what a RESTful API is and how to make use of it in Python programs and command-line tools.
- Next steps: Apply what I know about
http, Python'srequestsandpandaslibraries, andgitto build a robust, flexible tool that provides actionable, real-time data on Puerto Rico’s recovery from Hurricane Maria and other disasters.
Git
- Dev teams use GitFlow or GitHub Flow frameworks and strict branch naming conventions to keep projects organized. I can emulate these practices in my own solo data analysis projects to maintain a clean workspace and share my work effectively.
- Understanding how Git’s HEAD files track the most recent
commithelps demystify Git and lets me use the tool more precisely.
August 8, 2025
Takeaway of the day: Git Day 2! I learned how to use branches and restore to safely debug, refactor, and experiment on my programs.
Git
- In addition to branches and
restore, I learned how to usestatusandlogto investigate the state of my repository. - Got my first look inside
.git/config— the magic behind Git.
Blog
- Published a post: Git as Railroad
August 7, 2025
Takeaway of the day: Git Day!!! I finally grasp git. git init builds a local train station. gh repo create builds a remote train station. git add gathers up loose cargo (changes since last add) and loads them on a new train car. git commit takes those cars and connects them to the train engine at the station. The first git push builds the tracks to the remote station and sends the trains along the track.
Reflection: There are a lot of improvements I can still make to my PowerShellProfile.ps1 file, but it is robust and accomplishes a lot and works. I’m sure I’ll keep refining it, but it’s solid and useful. Time to move on.
Git
- Grasped the fundamentals of how git works and how I can use it as a solo programmer to keep my projects tidy and portable.
- Learned how team projects use upstreams, forks and branches to work concurrently and resolve code conflicts.
- Committed to a basic framework for keeping my git clean -
git pushat the end of each session andgit fetch originwithgit log origin/main..mainat the start of each session. - Learned other basic commands to investigate git status, including
status,branch,log --oneline, andshow.
PowerShell
- Debugged New-PyProject.ps1
- Learned that
privateconfines a variable to the context in which it is created.
August 6, 2025
Takeaway of the day: Today was chill, after the mental marathon of the past two days. Focused on reviewing my recent learnings, profoundizing my understanding of the New-PyProject function, and tinkering with PowerShell.
PowerShell
- Learned that
&runs a command (like a system call),throwprints an error and stops the script, and[CmdletBinding()]adds features to a function like-Verboseand-WhatIf. - Did a deep dive into function creation and how to read and use parameters.
foreach ($item in $object)is the standard loop syntax, whileForEach-Objectis used in pipelines (e.g.,... | ForEach-Object { }).- Differentiated three ways to build paths:
Join-Path, the native command, which will be my default.+for simple strings.- and
[System.IO.Path]::Combine(), the .NET method.
Git
- Installed the
GitHub CLIand authenticated, so I can now create repos and push from my terminal.
August 5, 2025
Takeaway of the day: A lot of computer science is about conventions. Let's keep names standard so that we can communicate well across systems.
.NET
- Used
[System.Environment]::GetEnvironmentVariable()and[System.Environment]::SetEnvironmentVariableto clean up my Python environment by removing Python 3.10 from my user path variable. - Used
[System.IO.Path]::Combine()to create robust path objects. - Used
[System.Management.Automation.CompletionResult]::New()to create autocompletion functionality in a custom PowerShell command.
PowerShell
- Created new cmdlets and aliases using PowerShell conventions.
- Used the
ForEachloop construct to loop through an array of directory names. - Used
@array constructor. - Built an Argument Completer and a Parameter Validator into my
New-PyProjectcustom function. - Learned:
- When piping commands that perform an action but are not meant to pass values, best practice is to use the
Out-Nullcmdlet to avoid passing unintended values. - About PowerShell's three variable scopes:
Global,Script, andLocal. - That PowerShell resolves the first word of a command in this order: aliases → functions (PROFILE or current script) → cmdlets → external executables (like
pythonorgit). If no match is found, it throws an error.
Windows OS
- Grasped that Environment Variables:
- let external programs know where to save directories and files.
- allow us to build portable scripts that leverage conventions and Environment Variables with code such as
Join-Path env:USERPROFILE "Downloads". - and provide a map that the terminal uses - via
PATHandPATHEXT- to execute external executiables.
Python
- Installed
pyenv-winto ensure each project runs with the correct Python vrsion, independent of system settings.
August 4, 2025
Takeaway of the day: Building the custom function New-PyProj gave me the visceral sense of PowerShell's power — it’s like a control tower for the entire system, and it is endlessly extensible.
PowerShell
- Created a PowerShell profile and built a function,
New-PyProj, that sets up a new Python project with folder structure, Git initialization, and a virtual environment usingpyenv-win. - Wrote and executed my first multi-step PowerShell command using
Test-Path, conditional logic, and piping. - Used
[System.Management.Automation.CompletionResult]::New()to create autocompletion functionality in a custom PowerShell command. - Learned what a REPL is when PowerShell allowed me to correct a mid-block syntax error interactively. Yet was unable to replicate the behavior when repeating the same error with exact same code snipped. Still decoding the rules here.
Windows OS
- Used Sysinternals'
handle.exeto identify which process was locking a folder when I had "Access Denied" issues. - Gained clarity on the difference between
/and\in Windows paths. Lesson: default to\to avoid unexpected behavior, such as accidentally referencing the root directory.
August 2, 2025
Systems
- Designed a robust tagging system for my blog that will reveal a strong narrative structure as the archive grows and help readers (and future-me) find content easily.