Computers handle a lot more than we think. Take, for example, when a coworker sends a zip file via email. To read the documents within the zip file, a computer user has to decompress the file, which takes up memory space and processing time. As more and more data gets processed and stored, computers begin to show signs of performance issues and slow down.
Fahad Saeed, associate professor in the School of Computing & Information Sciences within the College of Engineering & Computing, and Muhammad Haseeb, a computer science doctoral student, recognize this universal challenge and specialize in enhancing the compression and computing of big data.
Recently, the team of computer scientists was awarded a patent titled, “Methods and Systems for Compressing Data,” to reduce the size of proteomics data in computers’ desktop memory, while ensuring data sets can be processed without the need for decompression. Proteomics refers to the large-scale study of proteins.
To make sense of these data sets, Saeed and Haseeb are using specific techniques to handle big data.
“Since mass spectrometry data, which measures proteomics, is large, the index that holds that data is also becoming very large,” said Saeed. “It’s getting to a point where our everyday laptops, and desktops, cannot handle the size of the index.”
Indexing allows computer algorithms to be able to identify particular data sets that users are looking for – the location of folders, files, or records. Indexing identifies the location of data based on file names, text within a file, or unique characteristics found within a graphic or video file.
The solution? A compression technique to compress the index of the data versus the actual data. “Initially, we thought of creating an index with terabyte files, but then the index itself would still remain too large,” said Haseeb. “By compressing the index, we make it smaller and we can find data more efficiently and quicker without the need for decompression.”
So how does this innovation benefit people in their day-to-day lives? “We want to move toward computational techniques that will allow patients to receive personalized medical attention,” adds Saeed.
If mass spectrometry data processing becomes more efficient with the compression technique, clinicians could easily and more quickly identify which proteins biomarkers in their patients’ bodies are showing up. If a protein biomarker is present, medical professionals can then tailor medicine prescribed to their patients. Biomarkers include genes and genetic variations and serve as a guide for medical professionals to detect diseases such as Alzheimer’s disease or cancer.
Saeed, a National Science Foundation CAREER awardee, specializes in computer algorithms, big data, high-performance computing, and computational systems biology. Earlier this year, Saeed was awarded a $1 million grant from the National Institutes of Health to design and develop machine-learning algorithms for biologists to make sense of proteomics.