In this tutorial, you will learn how to use Open Interpreter and OCR to help you solve math and science questions. The jupyter notebook can be found here.
Are you looking for other AI use cases? Please check my AI use cases page.
For testing purposes, I got a random math question from SAT. If we simple use conventional OCR, the output result will not be good since conventional OCRs mostly fail to read math symbols. We have paid solutions like mathpix.com that are very good in reading math symbols but for this project we want to use something open-source.
Fortunately Meta AI recently released nougat OCR which is an OCR trained specifically for math and science symbols. The license allows commercial use in case you are interested.
Let’s use Google Colab as our developing environment. I recommend using the T4 GPU to at least speed up the OCR process. To install google colab dependencies, just run:
pip install tokenizers==0.14.0 nougat-ocr transformers open-interpreter
Now, I have a PDF with the question inside it. PDF is the only accepted format by nougat.
You can run the following command to download the pdf to google colab.
!wget -O math_questions.pdf <https://firebasestorage.googleapis.com/v0/b/t20med-oficial.appspot.com/o/tutorials%2FUntitled%201.pdf?alt=media&token=41a41615-88a9-496b-a7b9-b8418e02c086>
Now we downloaded the pdf file to our colab environment. Just run the nougat CLI to read the pdf and output the result in mathpix markdown format.
! nougat math_questions.pdf -o . --markdown --recompute
You will see in your environment a math_questions.mmd file. Download it and check the content, it should be something similar to this:
If \\(y=x^{3}+2x+5\\) and \\(z=x^{2}+7x+1\\), what is \\(2y+z\\) in terms of \\(x\\)?
1. \\(3x^{3}+11x+11\\)
2. \\(2x^{3}+x^{2}+9x+6\\)
3. \\(2x^{3}+x^{2}+11x+11\\)
4. \\(2x^{3}+2x^{2}+18x+12\\)