portfolio - Pedro

projects done during braskem internship

During braskem internship a lot of projects were developed, and basically every single project required Python for it development, along with python other libraries/frameworks were also used, these are going to be mentioned in the projects description below.
Also, since the projects were done inside a company, they are protected with Non Disclosure Agreement (NDA), and because of that some sensitive information about the project will not be shared here.
I appreciate your interest in my journey!

Multimodal Signals ML Classifier
Development of a pipeline where a raw data file containing a signal from a specific material analysis technique is used as input. After automated normalization of this signal, features are extracted and used as input data in a pre-trained Random Forest (sklearn) model to predict the type of material corresponding to the signal. Based on this pipeline and model, a software solution (Python) with a graphical interface (Tkinter) was developed to distribute the tool to analysts. One of the challenges of the project was the limited number of data points for each material, but by defining classes of similar materials, this issue was overcome. The signals used for analysis are also stored for future model improvement. Previously, this analysis took about 30 minutes for experienced analysts and hours for new analysts, but with this solution, it can now be performed in seconds with an accuracy of approximately 75%, while also removing the need for technical knowledge to interpret the signal (democratizing the analysis). For this project, constant communication and meetings with other departments were essential to collect data and monitor software performance.

●pandas

●numpy

●Tkinter

●scikit-learn

●scipy

Reconstruction of Overlapping Signals – ML Clusterization
Development of a machine learning script using Gaussian Mixture Model clustering (sklearn) to separate and reconstruct signals with overlapping peaks, enabling analyses that were previously impossible due to peak overlap. The main challenges here were related to the consistent and coherent manipulation of signals (scipy) and adapting the solution for scenarios with signal behavior variations. These challenges were addressed through constant collaboration with the technical team to better understand the possibilities. Today, the script is actively used and was a key component in securing patents.

●pandas

●numpy

●scikit-learn

●scipy

●Machine Learning

Image Featurization
Development of a method for extracting important data from microscopy images. Extracting data from microscopy images was a challenge for the team, as existing software solutions did not meet specific criteria. I developed scripts to identify and extract relevant data related to artifacts in the images (OpenCV). At the end of the process, the collected data was provided through practical and easy-to-understand visualizations. This method was transformed into a web application (Streamlit), deployed in an Azure container (Docker), and made available to company users, allowing simultaneous access by multiple users.

●pandas

●numpy

●docker

●OpenCV

●streamlit

RPA Extracting
Since braskem is also a research company, it has a lot of equipament to run analytical techniques, and often these equipament have their own software to treat the data the equipament generate. Since braskem is a huge company, a lot of analysis are done and takes a lot of time to work manually on those softwares to extract the data generated by the equipament. Hence, to save analists time, soma automations (RPA) were developed to extract the data automagically from the softwares. Also, some automations envolved extract data from files such as pdfs and images.

●tkinter

●pyautogui

●tesseract

●OpenCV

Projects Comparator
Since braskem has a enormous projects portfolio, and those projects are distributed on more than one software (and more than one database), it is hard to compare manually (analysts did it manually) if all the databases have the correct information. Based on that a program was developed to compare the content on both databases, where a report from each database was generated and than used as input in this program, and the program returns an excel file with a summary of what is missing or not correct in each report.

●pandas

●numpy

●tkinter

●auto-py-to-exe

Utils Functions
Even if all of the projects didnt work out as expected, in a way or another those projects generated knowledge. Considering that, some parts on the projects were tranformed in functions and keeped separated, in that way we can reuse the code in a future project without having the work of coding it all again. Those functions envolves every kind of task, since image editing, to str comparator or even matrix sampling. The functions were all commented and well documented.

●all of the technologies above and probaably more